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Abstract 


The  caOS  system  is  a  framework  designed  to  facilitate  the  development  of  highly  concurrent  real¬ 
time  signal  interpretation  applications.  It  explores  the  potential  of  multiprocessor  architectures  to 
improve  the  performance  of  expert  systems  in  the  domain  of  signal  interpretation. 

CAOS  is  implemented  in  Lisp  on  a  (simulated)  collection  of  processor-memory  sites,  linked  by  a 
high-speed  communications  subsystem.  The  “virtual  machine"  on  which  it  depends  provides  remote 
evaluation  and  packet-based  message  exchange  between  processes,  using  virtual  circuits  known  as 
streams.  To  this  presentation  layer,  CaOS  adds  (1)  a  flexible  process  scheduler,  and  (2)  an  object- 
centered  notion  of  sfcnls,  dywamirwlly-iMtantiahla  entities  which  model  interpreted  signal  features. 

This  report  documents  the  principal  ideas,  programming  model,  and  implementation  of  caos. 
A  model  of  real-time  signal  interpretation,  based  on  replicated  “abstraction”  pipelines,  is  presented. 
For  dome  applications,  this  model  offers  a  means  by  large  numbers  of  processors  may  .be  utilised 
without  introducing  software  bottlenecks. 

The  report  concludes  with  a  description  of  the  performance  of  a  large  caos  application  over 
various  sixes  of  multiprocessor  configurations.  Lessons  about  problem  decomposition  grain  site, 
global  problem  solving  control  strategy,  and  appropriate  services  provided  to  caos  by  the  underlying 
architecture  are  discussed. 


do.  :r„i47»8ACK' 

EDITION  Of  1  NOV  66  IS  OBSOLETE 


SECURITY  CLASSIFICATION  OF  This  PAGE  (Whin  Data  Ent#r#ai 


Knowledge  Systems  Laboratory 
Report  No.  KSL-86-22 


March  1986 


The  CAOS  System 


Eric  Schoen 


Department  of  Computer  Science 
Stanford  University 


Abstract 

The  CAOS  system  is  a  framework  designed  to  facilitate  the  development  of  highly  concurrent  real¬ 
time  signal  interpretation  applications.  It  explores  the  potential  of  multiprocessor  architectures  to 
improve  the  performance  of  expert  systems  in  the  domain  of  signal  interpretation. 

CAOS  is  implemented  in  Lisp  on  a  (simulated)  collection  of  processor-memory  sites,  linked  by  a 
high-speed  communications  subsystem.  The  “virtual  machine”‘on  which  it  depends  provides  remote 
evaluation  and  packet-based  message  exchange  between  processes,  using  virtual  circuits  known  as 
stream*.  To  this  presentation  layer,  CAOS  adds  (1)  a  flexible  process  scheduler,  and  (2)  an  object- 
centered  notion  of  agents,  dynamically-instantiable  entities  which  model  interpreted  signal  features. 

This  report  documents  the  principal  ideas,  programming  model,  and  implementation  of  caos. 
A  model  of  real-time  signal  interpretation,  based  on  replicated  ’'abstraction’*  pipelines,  is  presented. 
For  some  applications,  this  model  offers  a  means  by  large  numbers  of  processors  may  be  utilized 
without  introducing  synchronization-necessitated  software  bottlenecks. 

The  report  concludes  with  a  description  of  the  performance  of  a  large  caos  application  over 
various  sizes  of  multiprocessor  configurations.  Lessons  about  problem  decomposition  grain  size, 
global  problem  solving  control  strategy,  and  appropriate  services  provided  to  caos  by  the  underlying 
architecture  are  discussed. 
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Chapter  1 

Introduction  and  Overview 


This  report  documents  the  caos  system,  a  portion  of  a  recent  experiment  investigating  the  potential 
of  highly  concurrent  computing  architectures  to  enhance  the  performance  of  expert  systems.  The 
experiment  focuses  on  the  migration  of  a  portion  of  an  existing  expert  system  application  from  a 
sequential  uniprocessor  environment  to  a  parallel  multiprocessor  environment.1 

The  application,  called  elint,  is  a  portion  of  a  multi-sensor  information  fusion  system,  and  was 
written  originally  in  age[2],  an  expert  system  development  tool  based  on  the  blackboard  paradigm. 
For  the  purposes  of  this  experiment,  ELINT  was  reimplemented  in  caos,  an  experimental  concurrent 
blackboard  framework  based  on  the  explicit  exchange  of  messages  between  blackboard  agents. 

CAOS,  in  turn,  relies  on  services  provided  by  the  underlying  machine  environment.  In  the  present 
set  of  experiments,  the  environment  is  a  simulation  of  a  concurrent  architecture,  called  CARE  [5]. 
CARE  simulates  a  square  grid  of  processing  nodes,  each  containing  a  Lisp  evaluator,  private  memory, 
and  a  communications  subsystem;  message-passing  is  the  only  means  of  interprocessor  communica¬ 
tion. 

CAOS  is  principally  an  operating  system,  controlling  the  creation,  initialization,  and  execution 
of  independent  computing  tasks  in  response  to  messages  received  from  other  tasks.  Figure  1.1 
illustrates  the  relationship  between  the  various  software  components  of  the  experiment. 

The  following  chapter  briefly  describes  the  salient  features  of  the  care  environment.  Chapter  3 
discusses  the  ideas  behind  the  CAOS  framework.  Chapter  4  summarizes  the  caos  programming 
environment,  and  Chapter  5  describes  its  implementation.  The  final  chapter  details  the  results  of 
our  experiments.  Finally,  Appendix  A  contains  a  simple  caos  example,  and  Appendix  B  presents 
a  detailed,  low-level  look  at  the  implementation  of  caos. 


i TK,a  research  was  supported  by  DARPA  Contract  F30602 -SS-C-00 1 2 ,  NASA  Arne*  Contract  NCC  2-220-Sl,  and 
Boeing  Contract  W266875.  Eric  Schoen  »u  supported  by  a  fellowahip  from  NL  Industries. 
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Chapter  2 


An  Overview  of  CARE 


CARE  is  a  highly-parameterized  and  well- instrumented  multiprocessor  simulation  testbed,  designed 
to  aid  research  in  alternative  parallel  architectures.  It  runs  executes  within  Helios,  a  hierarchical, 
event-driven  simulator  which  has  been  described  elsewhere  [3]. 

A  typical  CARE  architecture  is  a  grid  of  processing  sites,  interconnected  by  a  dedicated  communi¬ 
cations  network.  For  example,  the  research  discussed  in  this  paper  was  performed  on  square  arrays 
of  hexagonally  connected  processors  (e.j.,  each  processor  is  connected  to  six  of  its  eight  nearest 
neighbors,  excluding  processors  at  the  edges  of  the  grid). 

Each  processing  site  consists  of  an  evaluator,  a  general-purpose  processor/ memory  pair,  and 
an  operator,  a  dedicated  communications  and  process  scheduling  processor  which  shares  memory 
with  the  evaluator.  Application-level  computations  take  place  in  the  evaluator,  a  component  which 
is  treated  as  a  “black  box”  Lisp  processor.  No  portion  of  its  interior  is  simulated;  the  host  Lisp 
machine  serves  as  the  evaluator  in  each  processing  site.  The  operator  performs  two  duties.  As  a 
communications  processor,  it  is  responsible  for  routing  messages  between  processing  sites.  As  a 
scheduling  processor,  it  queues  application-level  processes  for  execution  in  the  evaluator  (we  discuss 
the  scheduling  mechanism  in  greater  detail  below).  The  operator  is  simulated  and  instrumented  in 
great  detail. 

CARE  allows  a  number  of  parameters  of  the  processor  grid  to  be  adjusted.  Among  these  param¬ 
eters  are:  the  speed  of  the  evaluator,  the  speed  of  the  communications  network,  and  the  speed  of 
the  process-switching  mechanism.  By  altering  these  parameters,  a  single  processor  grid  specification 
can  be  made  to  simulate  a  wide  variety  of  actual  multiprocessor  architectures.  For  example,  we  can 
experiment  with  the  optimal  level-of-granularity  of  problem  decomposition  by  varying  the  speed  of 
both  process-switching  and  communications. 

Finally,  CARE  provides  detailed  displays  of  such  information  as  evaluator,  operator,  and  com¬ 
munication  network  utilization,  and  process  scheduling  latencies.  This  instrumentation  package 
informs  developers  of  care  applications  of  how  efficiently  their  systems  make  use  of  the  simulated 
hardware. 


2.1  The  CARE  Programming  Model 

CARE  programs  are  made  up  of  processes  which  communicate  by  exchanging  messages.  Messages 
flow  across  streams,  virtual  circuits  maintained  by  CARE.  The  following  services  are  used  by  caos 

New  Process;  Creates  a  new  process  on  a  specified  site,  running  a  specified  top-level  function.  A 
new  stream  is  returned,  enabling  the  “parent”  of  the  process  to  communicate  with  its  “child." 
Pointers  to  the  stream  may  be  exchanged  freely  with  other  known  processes  on  other  sites. 

New  Stream:  Creates  a  new  stream  whose  target  is  the  creating  process. 

Post  Packet:  Sends  a  message  across  a  specified  stream  to  a  remote  process. 

Accept  Packet:  Returns  the  next  message  waiting  on  a  specified  stream.  If  no  message  is  waiting 
when  this  operation  is  invoked,  the  invoking  process  is  suspended  and  moved  into  the  operator 
to  await  the  arrival  of  a  message. 

Memory  in  each  processing  site  is  private.  Ordinarily,  intrarmemory  pointers  may  not  be  ex¬ 
changed  with  processes  in  other  sites.  However,  any  pointer  may  be  encapsulated  in  a  remote- 
address,  and  may  then  be  included  in  the  contents  of  a  message  between  sites.  A  remote  address 
does  not  permit  direct  manipulation  of  remote  structures;  instead,  it  allows  a  process  in  one  site  to 
produce  a  local  copy  of  a  structure  in  another  site. 

Scheduling  on  a  care  node  is  entirely  cooperative,  and  is  based  on  message-passing.  The  message 
exchange  primitives  post-packet  and  accspt-packst  form  the  basis  of  process  scheduling.  A 
process  wishing  to  block  (yield  control  of  the  evaluator)  does  so  by  calling  accspt-packst  to  wait 
for  a  packet  to  arrive  on  a  stream.  The  application  program’s  scheduler  awakens  the  process  by 
calling  post-packet  to  send  a  packet  to  the  stream.  The  process  is  placed  on  the  queue  of  processes 
waiting  for  the  evaluator,  and  eventually  regains  control.  The  caos  scheduler,  which  we  describe 
in  Section  5-3,  is  implemented  in  terms  of  this  paradigm. 
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Chapter  3 


The  CAOS  Framework 


Caos  is  a  framework  which  supports  the  execution  of  multi-processor  expert  systems.  Its  design 
is  predicated  on  the  belief  that  future  parallel  architectures  will  emphasise  limited  communication 
between  processors  rather  than  uniformly-shared  memory.  We  expected  such  an  architecture  would 
favor  coarse-grained  problem  decomposition,  with  little  or  no  synchronization  between  processors. 
CAOS  is  intended  for  use  in  real-time  data  interpretation  applications,  such  as  continuous  speech 
recognition,  passive  radar  and  sonar  interpretation,  etc  [7,11]. 

A  caos  application  consists  of  a  collection  of  communicating  agents,  each  responding  to  a  number 
of  application-dependent,  predeclared  messages.  An  agent  retains  long-term  local  state.  Further¬ 
more,  an  arbitrary  number  of  processes  may  be  active  at  any  one  time  in  a  single  agent. 

Whereas  the  uniprocessor  blackboard  paradigm  usually  implies  pattern-directed,  demon- 
triggered  knowledge  source  activation,  CAOS  requires  explicit  messaging  between  agents;  the  costs 
of  automatically  communicating  changes  in  the  blackboard  state,  as  required  by  the  traditional 
blackboard  mechanism,  could  be  prohibitively  expensive  in  the  distributed-memory  multiprocessor 
environment.  Thus,  caos  is  designed  to  express  parallelism  at  a  very  coarse  grain-sise,  at  the 
level  of  knowledge  source  invocation  in  a  traditional  uniprocessor  blackboard  system.  It  supports 
no  mechanism  for  finer-grained  concurrency,  such  as  within  the  execution  of  agent  processes,  but 
neither  does  it  rule  it  out.  For  example,  we  could  easily  imagine  the  methods  which  implement  the 
messages  being  written  in  QLisp  [8],  a  concurrent  dialect  of  Common  Lisp. 

3.1  The  Structure  of  CAOS  Applications 

A  caos  application  is  structured  to  achieve  high  degrees  of  concurrency  in  two  principal  manners: 
pipelining  and  replication.  Pipelining  is  most  appropriate  for  representing  the  flow  of  information 
between  levels  of  abstraction  in  ah  interpretation  system;  replication  provides  means  by  which  the 
interpretation  system  can  cope  with  arbitrarily  high  data  rates. 


3.1.1  Pipelining 

Pipelining  in  a  common  means  of  parallelizing  tasks  through  a  decomposition  into  a  linear  sequence 
of  independent  stages.  Each  stage  is  assigned  to  a  separate  processing  unit,  which  receives  the 
output  from  the  previous  stage  and  provides  input  to  the  next  stage.  Optimally,  when  the  pipeline 
reaches  a  steady-state,  each  of  its  processors  is  busy  performing  its  assigned  stage  of  the  overall 

task. 

Caos  promotes  the  use  of  pipelines  to  partition  an  interpretation  task  into  a  sequence  of  inter¬ 
pretation  stages,  where  each  stage  of  the  interpretation  is  performed  by  a  separate  agent.  As  data 
enters  one  agent  in  the  pipeline,  it  is  processed,  and  the  results  are  sent  to  the  next  agent.  The  data 
input  to  each  successive  stage  represents  a  higher  level  of  abstraction. 

Advantages  of  Pipelining 

Sequential  decomposition  of  a  large  task  is  frequently  very  natural.  Structures  as  disparate  as 
manufacturing  assembly  lines  and  the  arithmetic  processors  of  high-speed  computing  systems  are 
frequently  baaed  on  this  paradigm. 

Pipelining  provides  a  mechanism  whereby  concurrency  is  obtained  without  duplication  of  mech¬ 
anism  (that  is,  machinery,  processing  hardware,  knowledge,  etc).  In  an  optimal  pipeline  of  n  pro¬ 
cessing  elements,  element  1  is  performing  work  on  task  t  +  n  -  1  when  element  2  is  working  on  task 
t  +  n  -  2,  and  so  on,  such  that  element  n  is  working  on  task  t.  As  a  result,  the  throughput  of  the 
pipeline  is  n  times  the  throughput  of  a  single  processing  element  in  the  pipeline. 

In  the  case  of  CAOS  applications,  the  individual  agents  which  compose  an  interpretation 
“pipeline”  are  themselves  simple,  but  the  overall  combination  of  agents  may  be  quite  complex. 

Disadvantages  of  Pipelining 

Unfortunately,  it  is  often  the  case  that  a  task  cannot  be  decomposed  into  a  simple  linear  sequence 
of  subtasks.  Some  stage  of  the  sequence  may  depend  not  only  on  the  results  of  its  immediate 
predecessor,  but  also  on  the  results  of  more  distant  predecessors,  or  worse,  some  distant  successor 
(e.f.,  in  feedback  loops).  An  equally  disadvantageous  decomposition  is  one  in  which  some  of  the 
processing  stages  take  substantially  more  time  than  others.  The  effect  of  either  of  these  conditions 
is  to  cause  the  pipeline  to  be  used  less  efficiently.  Both  these  conditions  may  cause  some  processing 
stages  to  be  busier  than  others;  in  the  worst  case,  some  stages  may  be  so  busy  that  other  stages 
receive  no  work  at  all.  As  a  result,  the  n-eiement  pipeline  achieves  leas  than  an  n-times  increase  in 
throughput.  We  discuss  a  possible  remedy  for  this  situation  in  the  following  section. 

3.1.2  Replication 

Concurrency  gained -through  replication  is  ideally  orthogonal  to  concurrency  gained  through  pipelin¬ 
ing.  Any  size  processing  structure,  from  individual  processing  elements  to  entire  pipelines,  is  a 
candidate  for  replication.  Consider  a  task  which  must  be  performed  on  average  in  time  f,  and  a 
processing  structure  which  is  able  to  perform  the  task  in  time  T,  where  T  >  t.  If  this  task  were 
actually  a  single  stage  in  a  larger  pipeline,  this  stage  would  then  be  a  bottleneck  in  the  throughput  of 
the  pipeline.  However,  if  the  single  processing  structure  which  performed  the  task  were  replaced  by 


8 


T ft  copies  of  the  same  processing  structure,  the  effective  time  to  perform  the  task  would  approach 
t,  as  required. 

Advantages  of  Replication 

The  advantages  of  replicating  processing  structure  to  improve  throughput  should  be  clear;  n  times 
the  throughput  of  a  single  processing  structure  is  achieved  with  n  times  the  mechanism.  Replication 
is  more  costly  than  pipelining,  but  it  apparently  avoids  problems  associated  with  developing  a 
pipelined  decomposition  of  a  task. 

Disadvantages  of  Replication 

Our  works  leads  us  to  believe  that  such  replicated  computing  structures  are  feasible,  but  not  with¬ 
out  drawbacks.  Just  as  performance  gains  in  pipelines  are  impacted  by  inter-stage  dependencies, 
performance  gains  in  replicated  structures  are  impacted  by  inter-structure  dependencies. 

Consider  a  system  composed  of  a  number  of  copies  of  a  single  pipeline.  Further,  assume  the 
actions  of  a  particular  stage  in  the  pipeline  affects  each  copy  of  itself  in  the  other  pipelines.  In  an 
expert  system,  for  example,  a  number  of  independent  pieces  of  evidence  may  cause  the  system  to 
draw  the  same  conclusion;  the  system  designer  may  require  that  when  a  conclusion  is  arrived  at  inde¬ 
pendently  by  different  means,  some  measure  of  confidence  in  the  conclusion  is  increased  accordingly. 
If  the  inference  mechanism  which  produces  these  conclusions  is  realised  as  concurrently-operating 
copies  of  a  single  inference  engine,  the  individual  inference  engines  will  have  to  communicate  between 
themselves  to  avoid  producing  multiple  copies  of  the  same  conclusions.  A  stringent  consistency  re¬ 
quirement  between  copies  of  a  processing  structure  decreases  the  throughput  of  the  entire  system, 
since  a  portion  of  the  system’s  work  is  dedicated  to  inter-system  communication. 

3.2  An  Example 

We  close  this  chapter  by  describing  the  organisation  of  ELINT,  illustrating  the  benefits  and  drawbacks 
of  the  CAOS  framework  applied  to  this  problem.  ELINT  is  an  expert  system  whose  domain  is  the 
interpretation  of  passively-observed  radar  emissions.  Its  goal  is  to  correlate  a  large  number  of  radar 
observations  into  a  smaller  number  of  individual  signal  emitters,  and  then  to  correlate  those  emitters 
into  a  yet  smaller  number  of  clusters  of  emitters.  ELINT  is  meant  to  operate  in  real  time;  emitters 
and  clusters  appear  and  disappear  during  the  lifetime  of  an  elint  run.  The  basic  flow  of  information 
in  ELINT  is  through  a  pipeline  of  the  various  agent  types,  which  we  now  describe  in  detail. 

Observation  Reader 

The  observation  reader  is  an  artificat  of  the  simulation  environment  in  which  elint  runs.  Its  purpose 
is  to  feed  radar  observations  into  the  system.  The  reader  is  driven  off  a  clock;  at  each  tick  (1  elint 
“time  unit"),  it  supplies  all  observations  for  the  associated  time  interval  to  the  proper  observation 
handlers.  This  behavior  is  similar  to  that  of  a  radar  collection  site  in  an  actual  elint  setting. 


Observation  Handler 

The  observation  handlers  accept  radar  observations  from  associated  radar  collection  sites  (in  the 
simulated  system,  the  observations  come  from  the  observation  reader  agent).  There  may  be  a  large 
number  of  observation  handlers  associated  with  each  collection  site.  The  collection  site  chooees  to 
which  of  its  many  observation  handlers  to  pass  an  observation,  based  on  some  scheduling  criteria 
such  as  random  choice  or  round-robin. 

Each  observation  containc  an  externally-assigned  number  to  distinguish  the  source  of  the  obser¬ 
vation  from  other  known  sources  (the  observation  id  is  usually,  but  not  always,  correct).  In  addition, 
each  observation  contains  information  about  the  observed  radar  signal,  such  as  its  quality,  strength, 
line-of-bearing,  and  operating  mode.  The  observation  does  not  contain  information  regarding  the 
source’s  speed,  flight  path,  and  distance;  EL1NT  will  attempt  to  determine  this  information  as  it 
monitors  the  behavior  of  each  source  over  time. 

When  an  observation  handler  receives  an  observation,  it  checks  the  observation’s  id  to  see  if  it 
already  knows  about  the  emitter.  If  it  does,  it  passes  the  observation  to  the  appropriate  emitter 
agent  which  represents  the  observation’s  source.  If  the  observation  handler  does  not  know  about  the 
emitter,  it  asks  an  emitter  manager  to  create  a  new  emitter  agent,  and  then  passes  the  observation 
to  that  new  agent. 

Emitter  Manager 

There  may  be  many  emitter  managers  in  the  system.  An  emitter  manager’s  task  is  to  accept 
requests  to  create  emitters  with  specified  id  numbers.  If  there  is  no  such  emitter  in  existence  when 
the  request  is  received,  the  manager  will  create  one  and  return  its  “address’’  to  the  requesting 
observation  handler.  If  there  is  such  an  emitter  in  existence  when  the  request  is  received,  the 
manager  will  simply  return  its  address  to  the  requestor.  This  situation  arises  when  one  observation 
handler  requests  an  emitter  than  another  observation  handler  had  previously  requested. 

The  reason  for  the  emitter  manager’s  existence  is  to  reduce  the  amount  of  inter-pipeline  de¬ 
pendency  with  respect  to  the  creation  of  emitters.  When  EtlNT  creates  an  emitter,  it  is  similar 
to  a  typical  expert  system’s  drawing  a  conclusion  about  some  evidence;  as  discussed  above,  elint 
must  create  its  emitters  in  such  a  way  that  the  individual  observation  handlers  do  not  end  up  each 
creating  copies  of  the  same  emitter.  Consider  the  following  strategies  the  observation  handlers  could 
use  to  create  new  emitters: 

1.  The  handlers  could  create  the  emitters  themselves  immediately.  Since  the  collection  site 
may  pass  observations  with  the  same  id  to  each  observation  handler,  it  is  possible  for  each 
observation  handler  to  create  its  own  copy  of  the  same  emitter.  We  reject  this  method. 

2.  The  handlers  could  create  the  emitters  themselves,  but  inform  the  other  handlers  that  they’ve 
done  this.  This  scheme  breaks  down  when  two  handlers  try  simultaneously  to  create  the  same 
emitter. 

3.  The  handlers  could  rely  on  a  single  emitter  manager  agent  to  create  alt  emitters.  While  this 
approach  is  safe  from  a  consistency  standpoint,  it  is  likely  to  be  impractical,  as  the  single 
emitter  manager  could  become  a  bottleneck  in  the  interpretation. 


4.  The  handlers  could  send  requests  to  one  of  many  emitter  managers,  chosen  by  some  arbitrary 
method.  This  idea  is  nearly  correct,  but  does  not  rule  out  the  possibility  of  two  emitter 
managers  each  receiving  creation  requests  for  the  same  emitter. 

5.  The  handlers  could  send  requests  to  one  of  many  emitter  managers,  chosen  through  some 
algorithm  which  is  invariant  with  respect  to  the  observation  id.  This  is  in  fact  the  algorithm 
in  use  in  elint.  The  algorithm  for  choosing  which  emitter  manager  to  use  is  based  on  a 
many-to-one  mapping  of  observation  id’s  to  emitter  managers.1 

Emitters 

Emitters  hold  some  state  and  history  regarding  observations  of  the  sources  they  represent.  As  each 
new  observation  is  received,  it  is  added  to  a  list  of  new  observations.  On  a  regular  basis,  the  list 
of  new  observations  is  scanned  for  interesting  information.  In  particular,  after  enough  observations 
are  received,  the  emitter  may  be  able  to  determine  its  heading,  speed,  and  location.  The  first  time 
it  is  able  to  determine  this  information,  it  asks  a  cluster  manager  to  either  match  the  emitter  to 
an  old  cluster  or  create  a  new  cluster  to  hold  the  single  emitter.  Subsequently,  it  sends  an  update 
message  to  the  cluster  to  which  it  belongs,  indicating  its  current  course,  speed,  and  location. 

Emitters  maintain  a  qualitative  confidence  level  of  their  own  existence  (possible,  probable,  and 
positive).  If  new  observations  are  received  often  enough,  the  emitter  will  increase  its  confidence  level 
until  it  reaches  positive.  If  an  observation  is  not  received  in  the  expected  time  interval,  the  emitter 
lowers  its  confidence  by  one  step.  If  the  confidence  falls  below  possible,  the  emitter  “deletes”  itself, 
informing  its  manager,  and  any  cluster  to  which  it  is  attached. 

Cluster  Managers 

The  cluster  managers  play  much  the  same  role  in  the  creation  of  cluster  agents  as  the  emitter 
managers  play  in  the  creation  of  emitters.  However,  it  is  not  possible  to  compute  an  invariant  to 
be  used  as  a  many-to-one  mapping  between  emitters.  If  ELINT  were  to  employ  multiple  cluster 
managers,  the  best  strategy  for  choosing  which  of  the  many  managers  would  still  result  in  the 
possible  'reation  of  multiple  instances  of  the  “same”  cluster.  Thus,  we  have  chosen  to  run  ELINT 
with  a  single  cluster  manager.  Fortunately,  cluster  creation  is  a  rare  event,  and  the  single  cluster 
manager  has  never  been  a  processing  bottleneck. 

As  indicated  above,  requests  from  emitters  to  create  clusters  are  specified  as  match  requests 
over  the  extant  clusters.  Emitters  are  matched  to  clusters  on  the  basis  of  their  location,  speed,  and 
heading.  However,  the  cluster  manager  does  not  itself  perform  this  matching  operation.  Although  it 
knows  about  the  existence  of  each  cluster  it  has  created,  it  does  not  know  if  the  cluster  has  changed 
course,  speed,  and/or  direction  since  it  was  originally  created.  Thus,  the  cluster  manager  asks  each 
of  its  clusters  to  perform  a  match. 

If  either  none  of  the  clusters  responds  with  a  positive  match,  a  new  cluster  is  created  for  the 
emitter;  if  one  cluster  responds  positively,  the  emitter  is  added  to  the  cluster,  and  is  so  informed  of 
this  fact;  if  more  than  one  cluster  responds  positively,  an  error  (or  a  mid-air  collision)  must  have 
occured. 

'The  algorithm  compute*  the  observation  id  modulo  the  number  of  emitter  manager*,  and  maps  that  number  to 
a  particular  manager. 
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Clusters 


The  radar  emissions  of  clusters  of  emitters  often  indicates  the  actual  behavior  of  the  cluster.  Cluster 
agents,  therefore,  apply  heuristics  about  radar  signals  to  determine  whether  the  behaviors  of  the 
clusters  they  represent  are  threatening  or  not.  This  information,  along  with  the  course  parameters 
of  each  radar  source,  is  the  “output*  of  the  ELINT  system.  A  cluster  will  delete  itself  if  all  constituent 
emitters  have  been  deleted. 


12 


Chapter  4 

Programming  in  the  CAOS 
Framework 

CAOS  is  package  of  functions  on  top  of  Lisp.  These  functions  are  partitioned  into  three  major  classes: 

•  Those  which  declare  agents. 

•  Those  which  initialise  agents. 

•  Those  which  support  communication  between  agents. 

We  now  describe  the  CAOS  operators  for  each  of  these  classes. 

4.1  Declaration  of  agents 

Agents  are  declared  within  an  inheritance  network.  Each  agent  inherits  the  characteristics  of  its 
(multiple)  parents.  The  simplest  agent,  vanilla-agent,  contains  the  minimal  characteristics  re¬ 
quired  of  a  functional  CAOS  agent.  All  other  CAOS  agents  reference  vanilla-agent  either  directly  or 
indirectly.  Another  predeclared  agent,  process-agenda-agent,  is  built  on  top  of  vanilla-agent, 
and  contains  a  priority  mechanism  for  scheduling  the  execution  of  messages. 

Application  agents  are  declared  by  augmenting  the  following  characteristics  of  the  base  or  other 
ancestral  agents: 

Local  Variables:  An  agent  may  refer  freely  to  any  variable  declared  local.  In  addition,  each  local 
variable  may  be  declared  with  an  initial  value. 

Messages:  The  only  messages  to  which  an  agent  may  respond  are  those  declared  in  this  table  This 
simplifies  the  task  of  a  resource  allocator,  which  must  load  application  code  onto  each  care 
site. 
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(defagent  agent-name  (parent j  parent,,) 

(localvars  variable i  •••  variable^) 

(asssagas  message^  •••  message „) 

(syabolically-referancad-agents  ayentj  •••  a$en<„)) 

Figure  4.1:  The  baaic  form  of  dsfagent 


Symbolically  Referenced  Agents:  Some  agenta  exist  throughout  a  caos  run.  We  call  such  agents 
static,  and  we  allow  code  in  agent  message  handlers  to  reference  such  agents  by  name.  Before 
an  agent  begins  running,  each  symbolic  reference  is  resolved  by  the  caos  runtimes. 

There  are  a  number  of  additional  characteristics;  moat  of  these  are  used  by  caos  internally,  and 
we  will  document  these  in  the  next  chapter. 

The  baaic  form  for  declaring  a  CAOS  agent  is  daf  agent.  It  has  the  form  illustrated  by  Figure  4.1. 
The  first  element  in  each  sublist  is  a  keyword;  there  are  a  number  of  defined  keywords,  and  their 
use  in  an  agent  declaration  is  strictly  optional.  An  agent  inherits  the  union  of  the  keyword  values  of 
its  parents  for  any  unspecified  keyword.  Of  those  keywords  which  are  specified,  some  are  combined 
with  the  union  of  the  keyword  values  of  the  agent’s  parents,  and  others  supersede  the  values  in  the 
parents.  Figure  4.2  contains  the  declaration  of  the  emitter  agent,  one  of  the  most  complex  examples 
in  CLINT. 

As  we  discuss  in  the  next  chapter,  dei agent  forms  are  translated  by  caos  into  Flavors  def flavor 
forms  [4].  caos  messages  are  then  defined  using  the  def asthod  function  of  zetalisp.  These  methods 
are  free  to  reference  the  local  variables  declared  in  the  def  agent  expression. 


4.2  Initialization  of  agents 

The  initial  CAOS  configuration  is  specified  by  the  caos-initializs  operator,  which  takes  the  form 
illustrated  by  figure  4.3;  for  example,  figure  4.4  is  ELlNT’s  initialization  form. 

The  first  portion  of  the  form  creates  the  static  agents.  In  figure  4.4,  a  static  agent  named  *1- 
gotcha-handler-1,  an  instance  of  the  class  al-obssrvatioa-handlsr,  is  created  on  the  CARE  site 
at  coordinates  (1, 2)  in  the  processor  grid. 

The  second  portion  of  the  form  is  a  list  of  LISP  expressions  to  be  evaluated  sequentially  when 
caos’s  initialization  phase  is  complete.  Each  expression  is  intended  to  send  a  message  to  one  of  the 
static  agents  declared  in  the  first  part  of  the  form.  These  messages  serve  to  initialize  the  application; 
in  figure  4.4,  the  initialization  messages  open  log  files  and  start  the  processing  of  elint  observations. 

Agents  may  also  be  created  dynamically.  The  create-agent-instaacs  function  accepts  an 
agent  class  name  and  a  location  specification;1  the  reaots-address  of  the  newly-created  agent 
is  returned.  While  dynamically  created  agents  may  nof  be  referenced  symbolically,  their  reaote- 
addrsss's  may  be  exchanged  freely. 


‘Currently.  t|«nu  may  be  created  at  or  near  «peciAed  CARE  tits*.  CAOS  makes  no  attempt  at  dynamic  load 


(dafagaat  al-aaittar  (procaaa-aganda-ageat) 

(localvars 

(procaaa-aganda  ’ (al-undo-collection-id-error 

el-cbanga-cluster-association 
al-aaittar-updata-on-t iaa-tick 
el-initial iza-aaittar 
al-updata-aaittar-froa-obaarration) ) 
(last-obaerred  -1000000) 

( cluat ar-aanagar  1  clast  er-nanager-0 ) 

aaaagar 

id 

*TP* 

observed 

fixes 

laat-haading 

laat-aeda 

confidence 

duatar 

nas-obeervations-since-tiBe-ticb-flag 

id-arxora 

gc-.lag) 

(aaaaagaa 

al-npdat e-eait t  ar-f roa-obe errat ion 
al-iaitializa-aaittar 
al-chaaga-cloatar-aaaociation 
al-ando-collactioa-id-arror) 

(syabolically-ref araacad-agaata 
al-coll act ioa-r aport ar-0 
al-corralatioa-raportar-0 
el-thxeat-reporter-0 
al-cluat ar-aaaagar-0 
al-clnat ar-aaaagar- 1 
el-cluster-aanager-2 
al-big- aar-handl ar 
al-got  cha-haadl ar 
el-eaitter-trace-reporter-O) ) 


Figure  4.2:  The  aaittar  agent 


(caos-initialize 

((.agent  —  name i  agent  —  class  site  —  address) 

...) 

((initial  —  meajajej ) 

...)) 

Figure  4.3:  The  basic  caos  initialization  form 

4.3  Communications  Between  Agents 

Agents  communicate  with  each  other  by  exchanging  messages.  Caos  does  not  guarantee  that  mes¬ 
sages  reach  their  destinations:  due  to  excessive  message  traffic  or  processing  element  failure,  mes¬ 
sages  may  be  delayed  or  lost  during  routing.  It  is  the  responsibility  of  the  application  program  to 
detect  and  recover  from  lost  messages.  Commensurate  with  the  facilities  provided  by  care,  mes¬ 
sages  may  be  tagged  with  routing  priorities;  however,  higher  priority  messages  are  not  guaranteed 
to  arrive  before  lower-priority  messages  sent  concurrently. 

Two  classes  of  messages  are  defined:  those  which  return  values  (called  value-desired  messages), 
and  those  which  do  not  (called  side-effect  messages).  The  value-desi red-messages  are  made  to  return 
their  values  to  a  special  cell  called  a  future.  Processes  attempting  to  access  the  value  of  a  future  are 
blocked  until  that  future  has  had  its  value  set.  It  is  possible  for  the  value  of  a  future  to  be  set  more 
than  once,  and  it  is  possible  for  there  to  be  multiple  processes  awaiting  a  future’s  value  to  be  set.3 

4.3.1  Sending  messages 

The  CARE  primitive  post -packet,  which  sends  a  packet  from  one  process  to  another,  is  employed 
in  CAOS  to  produce  three  basic  kinds  of  message  sending  operations: 

post ;  The  post  operator  sends  a  side-effect  message  to  an  agent.  The  sending  process  supplies 
the  name  or  pointer  to  the  target  agent,  the  message  routing  priority,  the  message  name  and 
arguments.  The  sender  continues  executing  while  the  message  is  delivered  to  the  target  agent. 

post-future:  The  post-future  operator  sends  a  value-desired  message  to  the  target  agent.  The 
sending  process  supplies  the  same  parameters  as  for  post,  and  is  returned  a  pointer  to  the 
future  which  will  eventually  by  set  by  the  target  agent.  As  for  post,  the  sender  continues 
executing  while  the  message  is  being  delivered  and  executed  remotely. 

A  process  may  later  check  the  state  of  the  future  with  the  future-satisfied?  operator,  or 
access  the  future’s  value  with  the  value-future  operator,  which  will  block  the  process  until 
the  future  has  a  value. 

post-value:  The  post-value  operator  is  similar  to  the  post-future  operator;  however,  the  send¬ 
ing  process  is  delayed  until  the  target  agent  has  returned  a  value,  post-value  is  defined  in 
terms  of  post-future  and  value-future. 

3 Futures  were  aiao  used  in  QLisp  end  Multilisp  [9j.  The  HEP  Supercomputer  [6]  implemented  a  simple  version  of 
futures  se  e  process  synchronisation  mechanism. 


caoa-iaitializa 

((al-obaarvat ion-reader-0  el-obaervation-reader  (2  2)) 
(el-big-eur-handler-1  el-observation-handler  (1  1)) 
(el-big- ear-handler- 2  el-obeerratioa-handler  (1  1)) 
(el-gotcha-handler-1  el-obaervation-handler  (1  2)) 
(al-gotcha-handler-2  el-obaervation-handler  (1  2)) 
(el-eaitter-aaaager-0  el-eaitter-aanager  (2  1)) 

( el-eaitter-aanager- 1  el-eaitter-aanager  (2  2)) 
(el-collectioa-reporter-0  el-collectioa-reporter  (1  2)) 
(al-corxalatioa-raportar-0  al-correlat ion-reporter  (1  3)) 
(el-threat-reporter-0  al-tbraat-raportar  (13)) 
(el-eaitter-trace-reporter-O  el-eaittar-trace-reporter 

(3  2)) 

(el-cluater-trace-raporter-0  el-duater-trace-reportar 

(3  1)) 

(al-duater-aaaager-0  el-cluater-aanager  (2  1))) 

((poat  el-obaervat ion-reader-0  ail 
* el-opea-obaerrat ioa-f ile 
•eliat-data-file*) 

(poat  al-collactioa-raportar-0  ail 
'el-iaitialize-reporter  t 
"a lint : raporta ; collactioaa . output") 

(poat  al-corralat ion-report  er-0  ail 
'al-iaitializa-raportar  t 
"aliat : raporta ; eorralatioaa . output”) 

(poat  al-thraat-raportar-0  ail 
* al-iaitializa-raportar  t 
"aliat : raporta ; thraata . output") 

(poat  al-aaittar-traca-roportar-0  ail 
'iaitializa-traca-raportar  t 
"aliat : raporta ; aaittar . tracaa") 

(poat  al-duatar-traca-raportar-0  ail 
'iaitializa-traca-raportar  t 
"aliat : raporta ; cluatar . tracaa” ) ) ) 


Figure  4.4:  The  initialization  declaration  for  ELINT 


4.3.2  Detecting  Lost  Messages 

It  is  possible  to  detect  the  loss  of  value-desired  messages  by  attaching  a  timeout  to  the  associated 
future.  The  functions  post-docksd-future  and  post-clocksd-valn*  are  similar  to  their  untimed 
counterparts,  but  allow  the  caller  to  specify  a  timeout  and  timeout  action  to  be  performed  if  the 
future  is  not  set  within  the  timeout  period.  Typical  actions  include  setting  the  future’s  value  with 
a  default  value,  or  resending  the  original  message  using  the  repost  operator. 

4.3.3  Sending  to  Multiple  Agents 

There  exist  versions  of  the  basic  posting  operators  which  allow  the  same  message  to  be  sent  to 
multiple  agents.3  aoltipost  sends  a  side  effect  message  to  a  list  of  agents;  mltipost-future  and 
suit ipost- value  send  a  value-desired  message  to  a  list  of  agents.  In  the  latter  case,  the  associated 
future  is  actually  a  list  of  futures;  the  future  is  not  considered  set  until  all  target  agents  have 
responded.  The  value  of  such  a  message  is  an  association-list;  each  entry  in  the  list  is  composed  of 
an  agent  name  or  reetot a- address  and  the  returned  message  value  from  that  agent.  There  exist 
clocked  versions  of  these  functions  (called,  naturally,  nultipost-docked-future  and  araltipost- 
clocked- value)  to  aid  in  detecting  lost  multicast  messages. 

4.4  Communications  Between  Processes 

Processes  in  each  agent  communicate  using  the  shared  local  variables  declared  in  the  agent.  Be¬ 
sides  sharing  previously  computed  results  this  way,  processes  may  also  share  the  results  of  ongoing 
computations. 

Consider  the  following  scenario:  within  an  agent,  some  process  is  currently  computing  some 
answer.  At  the  same  time,  another  process  begins  executing,  and  realises  somehow  that  the  answer 
it  needs  to  compute  is  the  same  answer  the  other  process  is  already  computing.  The  second  process 
could  take  one  of  two  actions:  it  could  continue  computing  the  answer,  even  though  this  would 
mean  redundant  work,  or  it  could  wait  for  the  first  process  to  complete,  and  return  its  answer.  The 
second  approach  is  feasible,  but  it  does  tie  up  resources  in  the  form  of  an  idle  process. 

The  caos  operators  attach  and  ny-handls  offer  a  third  alternative  solution.  If  a  process 
knows  it  may  ultimately  produce  an  answer  needed  by  more  than  one  requesting  agent,  it  obtains 
its  “handle”  (Section  5.4)  by  calling  ny-handls,  and  places  it  in  a  table  for  other  processes  to 
reference.  Any  other  process  wishing  to  return  the  same  answer  as  the  first  calls  attach,  with  the 
first  process’s  handle  as  argument.  The  first  process  returns  its  answer  to  all  requesting  agents 
waiting  for  answers  from  the  other  processes,  and  the  other  processes  return  no  value  at  all. 

4.5  What  CAOS  Offers  Over  CARE 

CAOS  is  a  large  system.  It  is  reasonable  to  ask  what  advantages  there  are  to  programming  in  caos 
as  opposed  to  programming  in  care.  We  believe  there  are  three  major  advantages: 

J  Neither  CAOS  nor  CARE  currently  support  •  predicated  multicuet  mode,  wherein  message*  would  sent  to  all 
agents  satisfying  a  particular  predicate;  messages  can  only  be  sent  to  a  fuily-speafied  list  of  agents. 
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Clarity:  The  framework  in  which  an  agent  is  declared  makes  explicit  its  storage  requirements  and 
functional  behavior.  In  addition,  the  agent  concept  is  a  heipful  abstract, on  at  which  to  view  ac¬ 
tivity  in  a  multiprocessing  software  architecture.  The  concept  lets  us  partition  a  flat  collection 
of  processes  on  a  site  into  groups  of  processes  attached  to  agents  on  a  site.  CAOS  guarantees 
the  only  interaction  between  processes  attached  to  different  agents  is  by  message-passing. 

Convenience:  The  programmer  is  freed  from  interfacing  to  care’s  low-level  communications  prim¬ 
itives.  As  we  said  earlier,  CAOS  is  basically  an  operating  system,  and  as  such,  it  shields  the 
programmer  from  the  same  class  of  details  a  conventional  operating  system  does  in  a  conven¬ 
tional  hardware  environment. 

Flexibility:  Currently,  CARE  schedules  processes  in  a  strict  first-in,  first-out  manner.  CAOS,  on  the 
other  hand,  can  implement  arbitrary  scheduling  policies  (though  at  a  substantial  performance 
cost;  we  discuss  this  in  Chapter  6). 


Chapter  5 

The  Runtime  Structure  of  CAOS 


Caos  is  structured  around  three  principal  levels:  site,  agent,  and  process.  Two  of  these  levels -site 
and  process-reflect  the  organization  of  CAR£;  the  remaining  (agent)  level  is  an  artifact  of  caos. 
We  discuss  first  the  general  design  principles  underlying  caos,  and  then  describe  in  greater  detail 
the  functions  and  structure  of  each  of  CAOS’s  levels.  Appendix  B  offers  a  complete  guide  to  the 
algorithms  and  data  structures  employed  in  CAOS. 

5.1  General  Design  Principles 

The  implementation  of  caos  described  in  this  paper  is  written  in  ZETalisp,  a  dialect  of  Lisp  which 
runs  on  a  number  of  commercially  available  single-user  Lisp  workstations.  ZETALISP  includes  an 
object-oriented  programming  tool,  called  Flavors,  which  has  proved  to  be  a  very  powerful  facility 
for  structuring  large  Lisp  applications. 

In  Flavors,  the  behavior  of  an  object  is  described  by  templates  known  as  classes.  An  instance, 
a  representation  of  an  individual  object,  is  created  by  instantiating  a  class.  Instances  respond  to 
messages  defined  by  their  class,  and  contain  static  local  storage  in  the  form  of  instance  variables. 
Classes  are  defined  within  an  inheritance  network;  each  instance  contains  the  instance  variables  and 
responds  to  the  messages  defined  in  its  class,  as  well  as  those  of  the  classes  from  which  its  class 
inherits. 

An  appropriate  usage  for  Flavors  is  the  modelling  of  the  behavior  of  objects  in  some  (not  nec¬ 
essarily  real)  world.  For  example,  CAOS  site  and  agents  structures  are  realized  as  Flavors  instances. 
The  characteristics  to  be  modelled  are  codified  in  instance  variables  and  message  names.  In  a  well- 
designed  application,  messages  and  variables  are  consistently  named;  thus,  the  implementation  of  a 
particular  behavior  is  totally  encapsulated  in  the  anonymous  function  which  responds  to  a  message. 

5.1.1  Extending  the  Notion 

In  some  sense,  a  Flavors  instance  is  an  abstract  data  type.  The  instance  holds  state,  and  provides 
advertised,  public  interfaces  (messages)  to  functions  which  change  or  access  its  state.  The  internal 
data  representation  and  implementations  of  the  access  functions  are  private. 


In  Flavors,  the  abstract  data  type  notion  is  unavailable  within  an  individual  instance.  Frequently, 
the  individual  instance  variables  hold  complex  structures  (such  as  dictionaries  and  priority  queues) 
which  ought  to  be  treated  as  abstract  data  types,  but  there  exist  no  common  means  within  the 
standard  Flavors  mechanism  for  doing  so. 

CAOS,  however,  supports  such  a  mechanism,  by  providing  a  means  of  sending  messages  to  instance 
variables  (rather  than  to  the  instances  themselves).  The  instance  variables  are  thus  able  to  store 
anonymous  structures,  which  are  initialized,  modified,  and  accessed  through  messages  sent  to  the 
variable.  Similar  mechanisms  exist  in  the  Unit  Package  [14]  and  in  the  strobe  system  [13],  both 
frameworks  for  representing  structured  knowledge. 

The  caos  environment  includes  a  number  of  abstract  data  types  which  were  found  to  be  useful 
in  supporting  its  own  implementation.  The  most  commonly  used  are: 

Dictionary:  The  dictionary  is  an  association  list.  It  responds  to  put,  gst,  add,  forget,  and  ini- 
tializs  messages. 

Sorted  Dictionary;  The  sorted-dictionary  is  also  implemented  as  an  association  list,  and  responds 
to  the  same  messages  as  does  the  standard  dictionary.  However,  the  sorted-dictionary  invokes 
a  user-supplied  priority  function  to  merge  new  items  into  the  dictionary  (higher-priority  items 
appear  nearer  the  front  of  the  dictionary).  This  dictionary  is  able  to  respond  to  the  great  sat 
message,  which  returns  the  entry  with  the  highest  priority,  and  to  the  next  message,  which 
returns  the  entry  with  the  next-highest  priority  as  compared  to  a  given  entry. 

The  sorted-dictionary  is  used  primarily  to  hold  time-indexed  data  which  may  be  collected 
out-of-order  (e.g.  when  data  for  time  n  +  1  may  arrive  before  data  for  time  n). 

Hash  Dictionary:  The  hash-dictionary  is  implemented  with  a  hash  table,  and  responds  to  the  same 
messages  as  the  unsorted  association  list  dictionary. 

Queue:  The  queue  data  type  is  a  conventional  first-in,  first-out  storage  structure.  The  put  message 
enqueues  an  item  on  the  tail  of  the  queue,  while  the  get  message  dequeues  an  item  from  the 
head  of  the  queue. 

Priority  Queue:  The  priority-queue  data  type  supports  a  dynamic  heapsort,  and  is  implemented  as  a 
partially-ordered  binary  tree.  It  responds  to  put,  get,  and  initialize  messages.  Associated 
with  the  queue  is  a  function  which  computes  and  compares  the  priority  of  two  arbitrary  queue 
elements;  this  function  drives  the  rebalancing  of  the  binary  tree  when  elements  are  added  or 
deleted. 

Monitor:  A  monitor  provides  mutual  exclusion  within  a  dynamically-scoped  block  of  Lisp  code.  It 
is  similar  in  implementation  to  the  monitors  of  Interiisp-D  and  Mesa  [10]. 

If  the  monitor  is  unlocked,  the  obtain-lock  message  stores  the  caller’s  process  id  as  the 
monitor’s  owner,  and  marks  the  monitor  as  locked;  otherwise,  if  the  monitor  is  locked,  the 
obtain-lock  message  places  the  caller's  process  id  on  the  tail  of  the  monitor's  waiting  queue, 
and  suspends  the  calling  process. 

The  rslsass-lock  message  removes  the  process  id  from  the  head  of  the  monitor’s  waiting 
queue,  marks  the  monitor’s  owner  to  be  that  id,  and  reschedules  the  associated  process. 
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Monitors  are  normally  accessed  using  the  sith-uonitor  form,  which  accepts  the  name  of 
an  instance  variable  containing  a  monitor,  and  which  cannot  be  entered  until  the  calling 
process  obtains  ownership  of  the  monitor.  The  with-nonitor  form  guarantees  ownership  of 
the  monitor  will- be  relinquished  when  the  calling  process  leaves  the  scope  of  the  form,  even  if 
an  error  occurs. 

5.2  The  CAOS  Site  Manager 

The  site  manager  consists  of  a  Flavors  instance  containing  information  global  to  the  site-information 

needed  by  all  agents  located  on  the  site.  In  addition,  the  site  manager  includes  a  CARE-level 

process  which  performs  the  functions  of  creating  new  agents  and  translating  agent  names  into  agent 

addresses,  as  described  below. 

The  following  instance  variables  are  part  of  the  site  manager: 

incoaing-strsaa:  This  instance  variable  contains  the  care  input  stream  address  on  which  the  site 
manager  process  listens  for  requests.  Agents  needing  to  send  messages  to  their  site  manager 
may  reference  this  instance  variable  in  order  to  discover  the  address  to  which  to  direct  site 
requests. 

static-mgsnt-streaa-tabls:  This  instance  variable  is  a  dictionary  which  maps  agent  names  into 
the  CARE  streams  which  may  be  used  to  communicate  with  the  agents.  The  entries  in  this 
dictionary  reflect  statically-created  agents;  new  entries  are  added  as  the  result  of  nss-initial- 
agsnt-onlins  messages  directed  to  the  site  (see  below).  The  dictionary  is  used  to  resolve  agent 
name- to- ad  dress  requests  from  agents  created  locally. 

unr ssolvsd-agsnt -str aaa-tabl • :  The  site  manager  keeps  track  of  agent  names  it  is  not  able  to 
translate  to  addresses  by  placing  unsatisflable  rsqusst-sywbolic-rsf  erases  requests  in  this 
dictionary.  The  keys  of  the  dictionary  are  unresoivabie  agent  names.  As  the  agent  names 
become  resolvable,  the  unsatisfied  requests  are  satisfied,  and  the  corresponding  entries  we 
removed  from  the  dictionary. 

After  the  initialisation  phase  of  a  caos  application  has  completed,  there  will  be  no  entries  in 
this  dictionary  in  any  of  the  sites. 

local-agents :  This  instance  variable  is  a  dictionary  whose  keys  are  the  names  of  agents  located 
on  the  site,  and  whose  values  are  pointers  to  the  Flavors  instances  which  represent  each  agent, 
local-agents  is  used  only  for  debugging  and  status-reporting  purposes. 

free-process-queue:  When  a  CARE  process  which  was  created  to  service  a  request  finishes  its 
work,  it  tries  to  perform  another  task  for  the  agent  in  which  it  was  created.  If  the  agent 
has  no  work  to  do,  the  process  suspends  itself,  after  enqueuing  identifying  information  in  this 
instance  variable,  which  holds  a  queue  abstract  data  type.  When  any  agent  on  the  same 
site  needs  a  new  process  to  service  some  request,  it  checks  this  queue  first;  if  there  are  any 
suspended  (free)  processes  waiting  in  this  queue,  it  dequeues  one  and  gives  it  a  task  to  perform. 
If  this  queue  is  empty,  the  agent  asks  CARE  to  create  a  new  process. 


The  site  manager  responds  to  the  following  messages: 

nss-initial-agsnt-online:  As  each  static  agent  starts  running  during  initialization  of  a  caos 
run,  it  broadcasts  its  name  and  care  input  stream  to  every  site  in  the  system,  using  this 
message.  The  correspondence  between  the  sending  agent’s  name  and  address  is  placed  in 
the  static-agsnt-strsaa-tabls  dictionary  for  future  reference  by  agents  located  on  the 
receiving  sites.  If  any  agents  have  placed  requests  for  this  new  agent  in  the  unrstolvad- 
agsnt-strsaa-tabls,  messages  containing  the  new  agent’s  name  and  address  are  sent  to  the 
waiting  agents. 

rsquest-syabolic-rsfsrence:  Whenever  astatic  agent  is  created,  it  runs  an  initialization  func¬ 
tion,  which  among  other  tasks,  caches  needed  agent  name- to- address  translations.  For  each 
translation,  the  agent  sends  this  message  to  its  site  manager.  If  the  site  manager  can  resolve 
the  name  upon  receipt  of  the  message,  it  responds  immediately;  otherwise,  it  queues  the  re¬ 
quest  in  the  unresolved-ageat-strsaa-table,  and  defers  answering  until  it  is  able  to  satisfy 
the  request.  The  requesting  agents  waits  until  it  has  received  the  answer  before  requesting 
another  translation. 

aaka-nav-agant :  This  message  is  sent  to  a  site  to  cause  a  new  agent  to  be  created  during  the 
course  of  a  CAOS  run.  The  site  manager  creates  the  new  (dynamic)  agent  and  returns  the 
agent’s  input  stream  to  the  sender  of  this  message.  The  newly-created  agent  is  not  placed 
in  the  static-  . .  wt -stxsaa-tabl •;  thus,  the  only  way  to  advertise  the  existence  of  such  a 
dynamically-created  agent  is  by  the  creator  of  an  agent  passing  the  returned  input  stream  to 
other  agents. 

5.3  The  CAOS  Agent 

As  discussed  above,  Caos  agents  are  implemented  as  Flavors  instances.  Their  class  definitions 
are  defined  by  translating  defagent  expressions  into  dsfilavor  expressions,  caos  itself  defines 
two  basic  agent  classes:  vanilla-agent  and  procass-agenda-agsnt.  vanilla-agent  defines  the 
minimal  agent;  process-agenda-agent  is  defined  in  terms  of  vanilla-agent,  but  adds  the  ability 
to  assign  priorities  to  messages.1  These  basic  agents  are  fully-functional,  but  lack  domain-specific 
“knowledge,”  and  cannot  be  used  directly  in  problem  solving  applications. 

As  stated  in  the  previous  chapter,  a  CAOS  agent  is  a  multiple-process  entity.  Moet  of  these 
processes  are  in  created  in  the  course  of  problem-solving  activity;  we  refer  to  these  as  aser  processes. 
At  runtime,  however,  there  are  always  two  special  processes  associated  with  each  caos  agent.  One 
of  these  processes  monitors  the  care  stream  by  which  the  agent  is  known  to  other  agents.  The 
other  participates  in  the  scheduling  of  user  processes.  We  shall  refer  to  the  first  of  these  processes 
as  the  agent  mpsf  monitor,  and  to  the  second  of  these  processes  as  the  agent  scheduler.  We  explain 
in  detail  the  functioning  of  these  two  processes  in  the  next  section. 

We  describe  here  the  role  of  important  instance  variables  in  a  basic  caos  agent: 

‘This  is  important  for  applications  in  which  one  agent  must  respond  rapidly  to  a  posting  from  another  agent. 
Assigning  a  message  a  high  priority  will  cause  that  message  to  be  processed  ahead  of  any  other  messages  with  lower 
priorities. 


self-address:  This  instance  variable  is  an  analogue  of  Flavors’  self  variable.  Whereas  self  is 
bound  to  the  Flavors  instance  under  which  a  message  is  executing,  self-address  is  bound  to 
the  stream  of  the  agent  under  which  a  caos  message  is  executing.  Thus,  an  agent  can  post  a 
message  to  itself  by  posting  the  message  to  self-address. 

runnable-process-streaa:  This  instance  variable  points  to  the  stream  on  which  the  scheduler 
process  listens.  Processes  which  need  to  inform  the  scheduler  of  various  conditions  do  so  by 
sending  CARE-level  messages  to  this  stream. 

running-processes :  This  variable  holds  the  list  of  user  processes  which  are  currently  executing 
within  the  agent.  The  current  care  architecture  supports  only  a  single  evaluator  on  each  site. 
CAOS  tries  to  keep  a  number  of  user  processes  ready  to  execute  at  all  times;  thus,  the  single 
CPU  is  kept  as  busy  as  possible. 

runnable-procsss-list :  A  priority  queue  containing  the  runnable  user  processes.  As  a  process  is 
entered  on  the  queue,  its  priority  is  calculated  to  determine  its  ranking  in  the  partial  ordering. 
There  are  two  available  priority  evaluation  functions:  the  first  computes  the  priority  baaed 
solely  on  the  time  the  process  entered  the  system;  the  second  considers  the  assigned  priority  of 
the  executing  message  before  considering  the  entry  time  of  the  process.  These  two  functions 
are  used  to  implement  the  scheduling  algorithms  of  the  vanilla-agent  and  the  proceas- 
aganda-agent,  respectively. 

scheduler-lock :  The  scheduler  data  structures  are  subject  to  modification  by  any  number  of 
processes  concurrently.  The  scheduler-lock  is  a  monitor  which  provides  mutual  exclusion 
against  simultaneous  access  to  the  scheduler  database. 

5.4  The  CAOS  Process 

In  this  section,  we  describe  the  mechanism  by  which  caos  user  processes  are  scheduled  for  execution 
on  CARE  sites.  User  processes  are  created  in  response  to  messages  from  other  agents.  Associated 
with  each  user  process  is  a  data  structure  called  a  runnable-itaa.  The  runnable-itea  contains 
the  following  fields: 

asssags-naas,  -args,  -id,  -answer-targets :  These  fields  store  the  information  necessary  to  han¬ 
dle  a  message  request  and  send  the  resulting  answer  back  to  the  proper  agents. 

f or-ef f ect :  This  field  is  a  boolean,  and  indicates  whether  the  message  is  being  executed  for  effect 
or  value.  This  corresponds  directly  to  the  source  of  the  message  coming  from  a  post  operation 
or  a  post-future  operation. 

stats:  This  field  indicates  the  state  of  the  process.  The  possible  states  that  a  process  may  enter, 
and  the  finite  state  machine  which  defines  the  state  transition  are  discussed  in  the  next  section. 

context:  This  field  contains  a  pointer  to  the  CARE  stream  upon  which  the  process  waits  when  it 
not  runnable.  A  process  (such  as  the  scheduler)  wishing  to  wake  another  process  simply  sends 
a  message  to  this  stream.  The  suspended  process  will  thus  be  awakened  (by  care). 


tias-staap:  This  field  contains  the  time  at  which  the  process  entered  the  system.  It  is  used  by 
the  functions  which  calculate  the  execution  priority  of  processes. 

The  CAOS  scheduler’s  only  handle  on  a  process  is  the  process’s  runnable- it «a.  In  fact,  the 
only  communication  between  a  user  process  and  the  caos  scheduler  consists  of  the  exchange  of 
runnabls-itsa’s. 

5.5  Flow  of  Control 

In  the  following,  we  detail  how  a  user  process,  the  caos  input  monitor,  and  the  caos  scheduler 
interact  to  process  a  message  request  from  a  remote  agent.  For  purposes  of  exposition,  we  assume 
the  following  sequence  of  events: 

1.  An  agent,  agent-1,  executes  a  post  operation,  with  agent -2  as  the  target.  The  posting  is 
for  the  message  named  a«ssag«-a. 

2.  agent-2  receives  and  executes  the  posting.  In  order  to  complete  the  execution  of  aessage-a, 
it  must  perform  a  post-value  operation  to  a  third  agent,  agent-3. 

We  begin  at  the  point  where  agent-1  has  performed  its  post  operation. 

5.5.1  Input  Processing 

The  input  monitor  process  handles  requests  and  responses  from  remote  agents.  When  the  message 
from  agent- 1  enters  agent-2,  its  input  monitor  creates  a  new  runnable- it  sa  to  hold  the  state  of 
the  request.  The  message  name,  arguments,  id,  and  answer  targets  are  copied  from  the  incoming 
message  into  the  runnable-itea.  The  runnable-i tea’s  state  is  set  to  never-run,  and  its  time 
stamp  is  set  to  the  current  time.  In  order  to  queue  the  message  for  execution,  the  input  monitor 
takes  one  of  two  actions. 

If  the  agent’s  runnable-process-list  is  empty,  the  runnable-itea  is  sent  in  a  message  to 
the  agent  scheduler  process  (by  sending  the  item  in  a  message  to  the  stream  whose  address  is 
found  in  the  agent’s  runnable-procsss-strsaa  instance  variable).  When  the  agent’s  runnable- 
process-list  is  empty,  the  scheduler  process  is  guaranteed  to  be  waiting  for  messages  sent  to 
the  scheduler  stream,  and  hence,  will  be  awakened  by  the  message  sent  from  the  input  monitor. 
The  scheduler  then  computes  the  priority  of  the  message,  and  places  the  runnable- itsa  in  its 
runnable-process-list. 

If  the  agent’s  runnable-process-list  is  not  empty,  the  input  monitor  computes  the  message's 
priority  and  places  the  runnable-itea  on  the  runnable-process-list  itself.  When  the  queue  is 
not  empty,  it  is  guaranteed  that  the  scheduler  will  examine  the  queue  sometime  in  the  future  to 
make  scheduling  decisions;  thus,  it  is  not  necessary  to  send  any  messages  to  the  scheduler  to  inform 
it  of  the  existence  of  new  processes. 
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5.5.2 


Creating  Processes 

Eventually,  the  newly-created  runnabls-itsa  will  reach  the  head  of  agant-2’s  runnable-process- 
liat.  At  this  time,  there  is  still  no  process  associated  with  the  item,  so  the  scheduler  creates  a 
process  using  the  facilities  of  CARS,  adds  the  process  to  the  running-processes  list,  and  passes  it 
its  runnable- it ea.  The  process  will  eventually  gain  control  of  the  evaluator,  and  will  set  the  state 
of  its  runnable- it  an  to  running.  It  then  begins  executing  the  requested  posting. 

5.5.3  Requesting  Remote  Values 

At  some  point,  the  process  executing  on  agent-2  requires  a  value  from  agent-3,  and  performs 
a  post-value  operation  to  acquire  it.  The  process  looks  up  the  address  of  agent-3,  and  poets 
a  message  which  contains  the  appropriate  message  name,  arguments,  id,  and  answer  target.  The 
aossage-id  unambiguously  identifies  the  future  upon  which  the  process  will  be  waiting  for  the 
value  to  be  returned.  The  answer  target  is  the  agent’s  own  self-address;  when  the  answer  is 
received  by  the  input  monitor  process,  it  will  be  forwarded  to  the  appropriate  future,  and  the 
process  will  be  reawakened. 

In  the  meantime,  the  process  sets  its  state  to  suspended,  removes  its  runnable-itsn  from  the 
running-processes  list,  and  appends  it  to  the  list  of  processes  already  waiting  for  the  future  to  be 
satisfied.  If  the  ruanable-procsss-liat  is  not  empty,  the  suspending  process  wakes  the  process 
at  the  head  of  the  queue.3  The  suspending  process  then  waits  for  a  message  on  its  wakeup  stream, 
the  stream  whose  address  is  in  the  context  field  of  its  runnable- itaa. 

5.5.4  Answer  Processing 

Some  time  later,  agent-3  will  have  completed  its  computations,  and  will  have  returned  the  desired 
answer  to  agent-2.  The  answer  will  be  received  by  agent-2’s  input  monitor  process,  which  will 
recognise  the  input  as  a  value  to  be  placed  in  a  future.  The  input  monitor  sets  the  value  field  of  the 
appropriate  future,  and  moves  the  ruanable-iteas  of  the  processes  waiting  on  the  future  to  the 
ruanable-procsss-liat. 

If  the  queue  was  previously  empty,  the  agent  must  have  been  (or  will  soon  be)  entirely  idle;  thus, 
the  runnable-  it  ms  are  sent  to  the  scheduler  in  a  message,  causing  the  scheduler  to  be  reawakened. 
If  the  queue  was  not  previously  empty,  the  agent  must  be  busy,  so  the  items  are  simply  added  to  the 
queue  according  to  their  priorities.  In  both  cases,  the  runnable- it sws  are  placed  in  the  runnable 
state. 


5.5.5  Reawakening  Suspended  Processes 

When  the  runnable  runnabls-itsa  reaches  the  head  of  agsnt-2’s  runnable-process-list,  a 
message  (which  contains  no  useful  information)  is  sent  to  its  associated  process’s  wakeup  stream. 
As  a  result,  process  eventually  wakes  up,  gains  control  of  the  evaluator,  and  seta  its  state  to  running. 

’In  affect,  the  process  takes  oa  the  role  of  the  scheduler.  Although  the  system  would  continue  to  work  with  only 
a  designated  scheduler  process  performing  scheduler  duties,  this  arrangement  permits  scheduling  to  take  place  with 
minimal  latency.  As  a  result,  fewer  evaluator  cycles  are  wasted  waiting  for  the  scheduler  process  to  run  the  next  user 
proceae. 
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5.5.6  Completing  Computation 

A  process  may  perform  any  number  of  post,  post-future,  or  post-value  operations  during  its 
lifetime.  Eventually,  however,  the  process  will  complete,  having  computed  a  value  which  may  or 
may  not  be  sent  back  to  the  requesting  agent.  If  the  process  was  suspended  for  any  portion  of  its 
lifetime,  another  process  may  have  attached  to  it;  in  this  case,  the  process  may  have  more  than  one 
requesting  agent  to  which  to  return  an  answer. 

Before  the  process  terminates,  it  examines  the  head  of  the  runnable-process-list.  If  the 
queue  is  empty,  the  process  simply  goes  away.  If  the  runnable-iten  at  the  head  of  the  queue  is 
runnable,  it  sends  the  appropriate  message  to  awaken  the  associated  process.  Finally,  if  the  item 
is  never-run,  the  process  makes  itself  the  process  associated  with  this  new  runnable- it en,  and 
executes  the  new  message  in  its  own  context.3  Barring  this  possibility,  the  process  “queues”  itself 
on  a  free  process  queue  associated  with  the  site  manager;  when  a  new  process  is  needed  by  an  agent 
on  the  site,  one  is  preferentially  removed  from  this  queue  and  recycled  before  a  entirely  new  process 
is  created.  This  way,  processes,  which  are  expensive  to  create,  are  reused  as  often  as  possible. 


3 This  is  another  situation  in  which  an  application  process  performs  scheduling  duties. 
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Chapter  6 


Results  and  Conclusions 


The  caos  system  we  have  described  has  been  fully  implemented  and  is  in  use  by  two  groups  within 
the  Advanced  Architectures  Project.  CAOS  runs  on  the  Symbolics  3600  family  of  machines,  as  well 
as  on  the  Texas  Instruments  Explorer  Lisp  machine.  CLINT,  as  described  in  Section  3.2,  has  also 
been  fully  implemented.  We  are  currently  analyzing  its  performance  on  various  size  processor  grids 
and  at  various  data  rates. 

6.1  Evaluating  CAOS 

CAOS  is  a  rather  special-purpose  environment,  and  should  be  evaluated  with  respect  to  the  pro¬ 
gramming  of  concurrent  real-time  signal  interpretation  systems.  In  this  chapter,  we  explore  caos’s 
suitability  along  the  following  dimensions: 

«  Expressiveness 

•  Efficiency 

•  Scalability 

6.1.1  Expressiveness 

When  we  ask  that  a  language  be  suitably  expressive,  we  ask  that  its  primitives  be  a  good  match 
to  the  concepts  the  programmer  is  trying  to  encode.  The  programmer  shouldn’t  need  to  resort  to 
low-level  “hackery"  to  implement  operations  which  ought  to  be  part  of  the  language.  We  believe 
we  have  succeeding  in  meeting  this  goal  for  caos  (although  to  date,  only  caos’s  designers  have 
written  CAOS  applications).  Programming  in  caos  is  programming  in  Lisp,  but  with  added  features 
for  declaring,  initializing,  and  controlling  concurrent,  real-time  signal  interpretation  applications. 


CAOS  has  a  very  complicated  architecture.  The  lifetime  of  a  message,  as  described  in  Section  5.5, 
involves  numerous  processing  states  and  scheduler  interventions.  Much  of  this  complexity  derives 
from  the  desire  to  support  alternate  scheduling  policies  within  an  agent.  The  cost  of  this  complexity 
is  approximately  one  order  of  magnitude  in  processing  latency.  For  the  common  settings  of  simu¬ 
lation  parameters,  care  messages  are  exchanged  in  about  2-3  milliseconds,  while  caos  messages 
require  about  30  milliseconds.  It  is  this  cost  which  forces  us  to  decompose  applications  coarsely, 
since  more  fine-grained  decompositions  would  inevitably  require  more  message  traffic. 

We  conclude  that  caos  does  not  make  efficient  use  of  the  underlying  care  architecture.  A 
compromise,  which  we  are  just  beginning  to  explore,  would  be  to  avoid  the  complex  flow  of  control 
described  in  Section  5.5  in  agents  whose  scheduling  policies  are  the  same  as  care’s  (FIFO).  In  such 
agents,  we  could  reduce  the  CAOS  runtimes  to  simple  functional  interfaces  to  CARE.  We  anticipate 
such  an  approach  would  be  much  more  efficient. 

6.1.3  Scalability 

A  system  which  scales  well  is  one  whose  performance  increases  commensurately  with  its  size.  Seal- 
ability  is  a  common  metric  by  which  multiprocessor  hardware  architectures  are  judged:  does  a 
100-processor  realization  of  a  particular  architecture  perform  10  times  better  than  a  10-processor 
realization  of  the  same  architecture?  Does  it  perform  5  times  better?  Only  just  as  well?  Or  Worse? 
In  hardware  systems,  scalability  is  typically  limited  by  various  forms  of  contention  in  memories, 
busses,  etc.  The  100-processor  system  might  be  slower  than  the  10-processor  system  because  all 
interprocessor  communications  are  routed  through  an  element  which  is  only  fast  enough  to  support 
10  processors. 

We  ask  the  same  question  of  a  CAOS  application:  does  the  throughput  of  ELINT,  for  example, 
increase  as  we  make  more  processors  available  to  it?  This  question  is  critical  for  CAOS-based  real-time 
interpretation  systems;  our  only  means  of  coping  with  arbitrarily  large  data  rates  is  by  increasing 
the  number  of  processors.  Section  6.2  discusses  this  issue  in  detail. 

We  believe  CAOS  scales  well  with  respect  to  the  number  of  available  processors.  The  potential 
limiting  factors  to  its  scaling  are  (1),  increased  software  contention,  such  as  inter-pipeline  bot¬ 
tlenecks  described  in  Section  3.1.2,  and  (S),  increased  hardware  contention,  such  as  overloaded 
processors  and/or  communication  channels.  Software  contention  can  be  minimized  by  the  design 
of  the  application.  Communications  contention  can  be  minimized  by  executing  caos  on  top  of 
an  appropriate  hardware  architecture  (such  as  that  afforded  by  CARE);  CAOS  applications  tend  to 
be  coarsely  decomposed-they  are  bounded  by  computation,  rather  than  communication-and  thus, 
communications  loading  has  never  been  a  problem. 

Unfortunately,  processor  loading  remains  an  issue.  A  configuration  with  poor  load  balancing,  in 
which  some  processors  are  busy,  while  others  are  idle,  does  not  scale  well.  Increased  throughput  is 
limited  by  contention  for  processing  resources  on  overloaded  sites,  while  resources  on  unloaded  sites 
go  unused.  The  problem  of  automatic  load  balancing  is  not  addressed  by  caos;  agents  are  assigned 
to  processing  sites  on  a  round-robin  basis,  with  no  attempt  to  keep  potentially  busy  agents  apart. 
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ELINT 

Control  Type/Grid  Sue 

Performance 

NC 

CC 

CC 

CT 

CT 

CT 

Dimension 

4x4 

4x4 

6x6 

2x2 

4x4 

6x6 

False  Alarms 

1 

r  0 

0 

0 

0 

0 

Reincarnation 

49 

42 

2 

0 

0 

0 

Confidence  Level 

19 

20 

90 

89 

93 

95 

Fixes 

48 

42 

99 

100 

100 

100 

Fusion 

0 

0 

77 

85 

88 

89 

Table  6.1:  Quality  of  CLINT  performance  of  various  grid  sixes  and  control  strategies  (1  CLINT  time 
unit  =  0.1  seconds). 

6.2  Evaluating  ELINT  Under  CAOS 

Our  experience  with  CLINT  indicates  the  primary  determiner  of  throughput  and  answer-quality  is 
the  strategy  used  in  making  individual  agents  cooperate  in  producing  the  desired  interpretation.  Of 
secondary  importance  is  the  degree  to  which  processing  load  is  evenly  balanced  over  the  processor 
grid.  We  now  discuss  the  impact  of  these  factors  on  ELlNT’s  performance. 

The  following  three  strategies  were  used  in  our  experiments: 

nc:  This  strategy  represents  limited  inter-agent  control.  No  attempt  is  made  to  prevent  concurrent 
creation  of  multiple  copies  of  the  "same”  agent  (this  possibility  arises  when  multiple  requests 
to  create  the  agent  arrive  simultaneously  at  a  single  manager).  As  a  result,  multiple,  non¬ 
communicating  copies  of  an  abstraction  pipeline  are  created;  each  receives  a  only  portion  of 
the  input  data  it  requires.  The  NC  strategy  was  expected  to  produce  poor  results,  and  was 
intended  only  as  a  baseline  against  which  to  compare  more  realistic  control  strategies. 

CC:  In  this  strategy,  the  manager  agents  assure  that  only  one  copy  of  a  agent  is  created,  irrespective 
of  the  number  of  simultaneous  creation  requests;  all  requestors  are  returned  pointers  to  the 
single  new  agent.  Originally,  we  believed  the  CC  (for  "creation  control”)  strategy  would  be 
sufficient  for  ELINT  to  produce  correct  high-level  interpretations. 

CT:  The  CT  (“creation  and  time  control")  strategy  was  designed  to  manage  skewed  views  of  real- 
world  time  which  develop  in  agent  pipelines.  In  particular,  this  strategy  prevents  an  eaitter 
agent  from  deleting  itself  when  it  has  not  received  a  new  observation  in  a  while,  yet  some 
observation-handler  agent  has  sent  the  eaitter  an  observation  which  it  has  yet  to  receive 

Table  6.1  illustrates  the  effects  of  various  control  strategies  and  grid  sizes.  The  table  presents 
six  performance  attributes  by  which  the  quality  of  an  ELINT  run  is  measured. 

False  Alarms:  This  attribute  is  the  percentage  of  eaitter  agents  that  elint  should  not  have  hy¬ 
pothesized  as  existing. 

ELINT  was  not  severely  impacted  by  false  alarms  in  any  of  the  configurations  in  which  it  was 
run. 
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*Tku*  run  *■<  fir  from  completion  when  it  wee  hilted  due  to  excessive  eccumuieted  well-dock  time. 

Table  6.2:  Simulated  time  required  to  complete  an  ELINT  run  (1  EUNT  time  unit  =  0.1  seconds). 


Control 

Tfpt 

Message  Count  1 

T ITT 

4x4 

NC 

>  16118 

- 

CC 

7375 

CT 

4516 

4703 

4616 

Table  6.3:  Number  of  messages  exchanged  during  an  ELINT  run  (1  EUNT  time  unit  =  0.1  seconds). 


Grid 

Size 

1  x  1 

2x2 

3x3 

4x4 

5x5 

6x6 

Simulated 
Time  (sec) 

9.42 

3.20 

1.49 

0.74 

0.52 

0.56 

Table  6  4:  Overall  Simulation  Times  for  CT  Control  Strategy  (1  ELINT  time  unit  =  0.01  seconds, 
debugging  agents  turned  off). 
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Reincarnation:  This  attribute  is  the  percentage  of  recreated  eaittsr  agents  (e.g.,  emitters  which 
haul  previously  existed  but  had  deleted  themselves  due  to  lack  of  observations).  Large  numbers 
of  reincarnated  emitters  indicate  some  portion  ELINT  is  unable  to  keep  up  with  the  data  rate 
(i.e.,  the  data  rate  may  be  too  high  globally,  so  that  all  emiiters  are  overloaded,  or  the  data 
rate  may  be  too  high  locally,  due  to  poor  load  balancing,  so  that  some  subset  of  the  emitters 
are  overloaded). 

The  CT  control  strategy  was  designed  to  prevent  reincarnations;  hence,  none  occurred  when 
CT  was  employed  on  any  size  grid.  When  cc  was  used,  only  the  6  x  6  grid  was  large  enough 
for  ELINT  to  keep  up  with  the  input  data  rate. 

Confidence  Level:  This  attribute  is  the  percentage  of  correctly-deduced  confidence  levels  of  the 
existence  of  an  emitter. 

The  correct  calculation  of  confidence  levels  depends  heavily  on  the  system  being  able  to  cope 
with  the  incoming  data  rate.  One  way  to  improve  confidence  levels  was  to  use  a  large  processor 
grid.  The  other  was  to  employ  the  CT  control  strategy,  since  fewer  reincarnations  result  in 
fewer  incorrect  (e.g.,  too  low)  confidence  levels. 

Fixes;  This  attribute  is  the  percentage  of  correctly-calculated  fixes  of  an  witter. 

Fixes  can  be  computed  when  an  wittar  has  seen  at  least  two  observations  in  the  same  time 
interval.  If  an  witter  is  undergoing  reincarnation,  it  will  not  accumulate  enough  data  to 
regularly  compute  fixes.  Thus,  the  approaches  which  minimized  reincarnation  maximized  the 
correct  calculation  of  fix  information. 

Fusion;  This  attribute  is  the  percentage  of  correct  clustering  of  wittar  agents  to  cluster  agents. 

The  correct  computation  of  fusion  appeared  to  be  related,  in  part,  to  the  correct  computation 
of  confidence  levels.  The  fusion  process  is  also  the  most  knowledge-intensive  computation  in 
ELINT,  and  our  imperfect  results  indicate  the  extent  to  which  ELiNT’s  knowledge  is  incomplete. 

We  interpret  from  Table  6.1  that  control  strategy  has  the  greatest  impact  on  the  quality  of 
results.  The  CT  strategy  produced  high-quality  results  irrespective  of  the  number  of  processors 
used.  The  CC  strategy,  which  is  much  more  sensitive  to  processing  delays,  performed  nearly  as  well 
only  on  the  6x6  processor  grid.  We  believe  the  added  complexity  of  the  CT  strategy,  while  never 
detrimental,  is  only  beneficial  when  the  interpretation  system  would  otherwise  be  overloaded  by 
high  data  rates  or  poor  load  balancing. 

Tables  6.2  and  6.3  indicate  that  cost  of  the  added  control  in  the  ct  strategy  is  far  outweighed 
by  the  benefits  in  its  use.  Far  less  message  traffic  is  generated,  and  the  overall  simulation  time  is 
reduced  (In  Table  6.2,  the  last  observation  is  fed  into  the  system  at  3.6  seconds;  hence,  this  is  the 
minimum  possible  simulated  run  time  for  the  interpretation  problem). 

Finally,  Table  6.4  illustrates  the  effect  of  processor  grid  size  when  the  CT  control  strategy  is 
employed.  This  table  was  produced  with  the  data  rate  set  ten  times  higher  than  that  used  to 
produce  tables  6. 1—6.3;  the  minimum  possible  simulated  run  time  for  the  interpretation  problem  is 
0.36  seconds.  The  speedup  achieved  by  increasing  the  processor  grid  size  is  nearly  linear  with  the 
square  root  of  the  size;  however,  the  6  x  6  grid  was  slightly  slower  than  the  5  x  5  grid.  In  this  last 
case,  we  believe  the  data  rate  was  not  high  enough  to  warrant  the  additional  processors. 


6.3  Unanswered  Questions 

caos  has  been  a  suitable  framework  in  which  to  construct  concurrent  signal  interpretation  systems, 
and  we  expect  many  of  its  concepts  to  be  useful  in  our  future  computing  architectures.  Of  principal 
concern  to  us  now  is  increasing  the  efficiency  with  which  the  underlying  care  architecture  is  used 
In  addition,  our  experience  suggests  a  number  of  questions  to  be  explored  in  future  research: 

•  What  is  the  appropriate  level  of  granularity  at  which  to  decompose  problems  for  CARE-like 
architectures? 

•  What  is  the  most  efficient  means  to  synchronize  the  actions  of  concurrent  problem  solvers 
when  necessary? 

•  How  can  flexible  scheduling  policies  be  implemented  without  significant  loss  of  efficiency? 
What  is  the  impact  on  problem  solving  if  alternate  scheduling  policies  are  not  provided? 

We  have  started  to  investigate  these  questions  in  the  context  of  a  new  CARE  environment.  The 
primary  difference  between  the  original  environment  and  the  new  environment  is  that  the  process 
is  no  longer  the  basic  unit  of  computation.  While  the  new  care  system  still  supports  the  use  of 
processes,  it  emphasizes  the  use  of  contexts:  computations  with  less  state  than  those  of  processes. 

When  a  context  is  forced  to  suspend  to  await  a  value  from  a  stream,  it  is  aborted,  and  restarted 
from  scratch  later  when  a  value  is  available.  This  behavior  encourages  fine-grained  decomposition 
of  problems,  written  in  a  functional  style  (individual  methods  are  small,  and  consist  of  a  binding 
phase,  followed  by  an  evaluation  phase). 

In  addition,  CARE  now  supports  arbitrary  prioritization  of  messages  delivered  to  streams.  As  a 
result,  it  is  no  longer  necessary  to  include  in  CAOS  its  complex  and  expensive  scheduling  strategy 
Early  indications  are  that  the  new  care  environment  with  a  slightly  modified  caos  environment 
performs  between  two  and  three  orders  of  magnitude  faster  than  the  configuration  described  in  this 
paper. 
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Appendix  A 

Mergesort:  A  Simple  CAOS 
Application 


Mergesort  is  an  efficient  sorting  algorithm.  It  is  simple,  and  well-suited  to  a  concurrent,  message¬ 
passing  implementation.  As  mergesort  is  not  a  real-time  application,  we  need  not  be  concerned  with 
the  effects  of  any  data  rate.  Further,  its  run  time  is  determined  entirely  by  the  size  of  the  input;  it 
is  not  sensitive  to  initial  sorting  of  the  data. 

Our  algorithm  recursively  subdivides  the  input  list  into  two  half-size  lists,  until  lists  of  length  2 
are  obtained.  These  lists  are  then  trivially  sorted,  and  recombined  in  sorted  order  as  the  recursion 
is  unwound.  We  exploit  the  concurrent  CAOS  architecture  by  implementing  the  recursion  as  post- 
valus  messages  sent  to  other  agents.  Each  processor  contains  a  single  asrgssort  agent.  Agents  are 
assigned  in  a  globally  round-robin  order,  and  are  created  when  necessary  by  a  asrgssort -aanagsr; 
we  employ  one  manager  per  column  in  the  processor  grid  (this  makes  use  of  a  natural  invariant 
which  lets  us  replicate  managers-see  our  discussion  of  this  approach  within  ELINT,  in  Section  3.2). 
The  algorithm  adapts  automatically  to  different  processor  grids. 

Table  A.l  illustrates  asrgssort 's  runtime  on  different  processor  grids  and  on  various  input 
lengths,  asrgssort  is  well-known  to  require  0(n  log  r»)  time  on  a  uniprocessor;  similar  analysis 
indicates  asrgssort  should  require  O(n)  time  on  an  “infinite”  number  of  processors.1  On  a  grid 
of  size  I,  asrgssort  implements  a  very  expensive  approach  to  a  conventional  mergesort  (examine 
the  leftmost  column  of  the  table);  however,  on  a  sufficiently  large  gTid,  the  algorithm  distributes 
computation  across  enough  processors  efficiently  enough  to  achieve  nearly  O(n)  time  (as  seen  in 
diagonal  boundary  of  the  table). 

Table  A.l  also  illustrates  the  effects  of  choosing  too  small  a  grain-size  for  caos.  asrgssort  is 
dominated  by  both  communication  and  agent  creation  costs.  It  took  substantially  longer  to  sort  an 
8-element  list  on  4  processors  than  on  l  processor.  Most  of  this  time  was  spent  waiting  for  answers 
from  asrgssort-aanagsr  agents. 

1  An  infinite  number  of  processors  is  •  sufficient  number  to  prevent  any  runnable  “process"  from  having  to  wait  for 
a  free  processor;  in  our  implementation  of  mtrfttori,  this  number  is  n/2.  Shapiro's  implementation  in  Concurrent 
Prolog  achieved  O(n)  time  with  O(logn)  processors  [12|. 
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Table  A.l:  aergesort  runtimes  (in  milliseconds)  on  various  processor  ghda  and  input  sizes. 


A.l  The  mergesort  Source  Code 


'This  section  contains  the  source  code  for  aergesort.  It  is  intended  to  show  the  flavor  of  program¬ 
ming  in  CAOS  with  a  relatively  simple  example.  We  show  first  the  code  which  declares  and  executes 
within  the  aergesort  and  aergesort -aanager  agents. 


i 


; ; ;  Global  variables  controlling  assignment  of  agents  to  sites 

;;;  If  we  were  strict,  this  wouldn't  be  possible,  since  we're 
;;;  making  use  of  the  fact  that  aeaor  in  each  site  really  isn't 
;;;  distributed.  However,  we  do  this  to  force  round-robin 
;;;  allocation. 

(def const  elast-x*  1) 

(defconst  elast-y*  1) 

(defconst  ‘array -width*  1) 

(defconst  * array -height*  1) 

; ; ;  Define  the  basic  aergesort  agent 
(def agent  aergesorter  (vanilla-agent) 

(docuaentation  "An  agent  which  can  perform  a  level  of  aergesorting" 
( syabolically-ref  er enc  ed-agents 
((aergesorter- 1-1)  aergesorter) 

( (aergesort-aanager- 1 )  aergesort -Manager) 

((aergesort -manager-2)  aergesort-aanager) 

((aergesort-aanager-3)  aergesort-aanager) 

((aergesort-aanager-4)  aergesort-aanager) 

((aergesort-aanager-6)  aergesort-aanager) 

( (aergeeort-aanager-6 )  aergesort-aanager)) 

( instance- vars 

(known-sorters  vp-slot  value  nil  datatype  SSdictionary) 

(managers  vp-slot  value  '((1  .  aergesort-aanager- 1) 

(2  .  aergesort -aanager-2) 

(3  .  aergesort-aanager-3) 

(4  .  aergesort-aanager-4) 

(5  .  aergesort-aanager-5) 

(0  .  aergesort-aanager-6)) 
datatype  fldictionary)) 

(aessages-aethods  (aergesort  : aergesort))) 


) 
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; ; ;  The  initialize  method  clears  the  dictionary  of  site-agent 
;;;  mappings  prior  to  the  start  of  each  run. 

(defmethod  (merge sorter  : initialize)  (treat  ignore) 

(send  self  ‘known-sorters  : initialize)) 

; ; ;  The  next-neighbor  method  returns  a  stream  to  a  sorting  agent 
;;;  which  will  perform  half  of  the  next  lower-level  recursive  sort, 
(defmethod  (mergesorter  : next-neighbor)  () 

(let  ((next-location-site 

(moltiple-value-bind  (x  y)  (next-x-and-y) 

;;  x  and  y  hold  site  coordinates  for  the  next  agent, 
(send  (lookup-site  x  y)  : care-site)))) 

(let  ( (maybe-known-agent 

;;  check  the  dictionary  for  a  site-agent  mapping. 

(send  self  ‘known-sorters  :get  next-location-site))) 
(eond  (maybe-known-agent  maybe-known-agent) 

(t  (let  ((next-location 

(send  next-location-site  : location))) 

;;  Don't  know  the  napping.  Ask  a  manager. 

(send  self  ‘known-sorters  :pnt 
next-location-site 

(post- value  (send  self  'managers  :get 

(first  next-location)) 

nil 

:new-agent  (first  next-location) 
(second  next-location))))))))) 


(defaethod  (nergesorter  :nerg*sort)  (fcTest  list) 

(cond  (  (eq  (length  list)  2) 

;;  Trivial  cut.  Lists  of  length  2. 

‘(.(■in  (first  list)  (second  list)) 

,(iu  (first  list)  (sscond  list)))) 

(t  (1st*  ((first -neighbor  (send  self  .next -neighbor) ) 

(sscond-nsighbor  (send  self  : next -neighbor))) 

;;  he curse :  divide  the  list  end  sort  both  helves. 

; ;  Use  poet-future  to  start  each  half . 

(first-future 

(lerpr-f uncall  •* post-future  first-neighbor  ail 
:nergeaort 

(copy list  (first-half  list)))) 

(second-future 

(lexpr-f uncall  $  ‘-poet-future  second-neighbor  nil 

: nergesort 

(copylist  (second-half  list))))) 

;;  Combine  the  sorted  subliste. 

;;  value- future  blocks  until  the  half  finishes. 

(do  ((el  (value-future  first-future) 

(cond  ((null  #2)  (ct'r  #1)) 

((or  (null  el)  (»  (first  el)  (first  e2))) 
el) 

(t  (edr  el)))) 

(e2  (value-future  second-future) 

(cond  ((null  el)  (edr  *2)) 

((or  (null  e2)  (>  (first  e2)  (first  el))) 
e2) 

(t  (edr  e2) ) )) 

(result  nil)) 

((and  (null  el)  (null  e2))  result) 

(cond  ((and  el  e2) 

(setq  result  (neonc  result 

(list  (uin  (first  el) 

(first  e2)))))) 

(el  (setq  result  (nconc  result 

(list  (first  el))))) 

(t  (setq  result  (nconc  result 

(list  (first  *2)))))))))) 

; ; ;  Function  to  maintain  globally  round-robin  agent-site 
;;;  allocation. 

(defun  next-x-aad-y  () 


(multipla-yalua-progl  (valuas  *last-x*  *laat-y») 

(whan  (>  (incf  *last-x*)  *array-aidth») 

(satq  »last-x*  1) 

(whan  (>  (incf  *last-y*)  *array-haight*) 

(satq  *last-y*  1))))) 

;;;  Raton  tha  fir at  half  of  a  liat. 

(dafun  first-half  (list) 

(loop  for  i  from  1  to  (//  (langth  list)  2)  as  a  is  liat 
collact  a)) 

; ; ;  hatura  tha  sacond  half  of  a  list . 

(dafss  sacond-half  (list)  (athcdr  (//  (langth  list)  2)  list)) 

;;;  Dafina  tha  nargasort-maaagar.  Thasa  agants,  locatad  oaa 
;;;  par  eolssm  in  tha  procassor  grid,  ara  rasponsibla  for 
;;;  craating  nam  margasort  agants  upon  raqnsst. 

(dafagant  aargasort-managar  (ranilla- agant) 

(docuaantation  "in  agant  to  craata  othar  margasort ars") 
(instanca-rars  agant-array) 

(massagas-mathods  (nam-agant  :saw-agant))) 

;;;  Tha  initialira  mathod  claars  tha  masngar's  mapping  of 
;;;  (z,y)  coordismtas  to  margasort  agant. 

(dafsathod  (margasort -managar  : initialira)  (max-x  maz-y) 

(satq  ag ant-array  (maka-array  (list  (!♦  maz-x)  (1*  mmz-y))))) 

; ; ;  Tha  na«-agant  mathod  ratnrns  tha  agant  alraady  at 
;;;  (z,y),  or  craatas  a  nam  agant  at  (z,y)  and  ratnrns  it. 
(dafsathod  (margasort -managar  :nam-agant)  (z  y) 

(cond  ((araf  agant-array  x  y)) 

(t  (lat  ((tha-nam-agant  (craata-agant-instanca 

’margasort ar 
(list  z  y)))) 

(asat  tha-nam-agant  agant-array  x  y) 
tha-nam-agant ) ) ) ) 


This  next  section  of  code  is  the  caos  initialisation  file  which  produced  the  runtime  numbers  dis¬ 
played  in  Table  A.l 

(defconst  *the-origiaal-list* 

* (  6  7  4  1  2  8  5  3  16  12  9  11  IS  13  10  14 

32  22  30  21  28  19  26  18  24  31  22  29  20  29  25  17 

64  63  62  61  60  59  34  33  58  57  56  55  54  53  52  51 

50  49  48  47  46  45  44  43  42  41  40  39  38  37  36  35)) 

(defconst  ethe-eurrent-liste  nil) 

( caos - init inline 

((■ergesorter-1-1  nergesorter  (1  1)) 

(asrgesort -manager- 1  aergesort-na. ager  (1  1)) 

(aergesort -manager- 2  nergesort-nanager  (2  1)) 

(aergesort-aanager-3  aergesort-aanager  (3  1)) 

( aergesort -nanager-4  aergesort-aanager  (4  1)) 

(aergesort -manager- 5  merge sort -manager  (5  1)) 

(aergesort-manager-6  aergesort-aanager  (6  1))) 

((sith-opem-f ile  (log  Mz7-.schoea.qsort;  qsort.log”  :*rite) 

(setq  •the-corrent-list*  ethe-original-liste) 

(loop  eith  start-time  for  j  froa  6  down to  l  do 
(format  log  "“tSorting  the  list:*»"S" 

•the-cnrreat-liste) 

(loop  for  i  froa  1  to  j  do 
(■alt ipost-wnlne 

’(aergesort -manager- 1  aergesort-manager-2 
aergesort -manager-3  aergesort -manager-4 
aergesort -aanager-5  aergesort -aanager-6) 

ail  :iaitiallse  i  i) 

(post-walne  aergesort er- 1-1  nil  : initialize) 

(format  log  ”~83tartiag  *D  processor  sort  at  "D" 

(*  i  i)  (caos-tiae)) 

(setq  start-tiae  (caos-tiae)) 

(lezpr-fuacall  i’post-walne  aergesort er-1-1  nil 
:aergesort  ethe-current-list*) 

(format  log  "*6Finished  at  *D.  That  took  *D  ns” 

(caos-tiae) 

(e  (-  (caos-tiae)  start-tine)  1.0e-S))) 

(setq  ethe-current-list*  (first-half  ethe-current-list*)))))) 


We  conclude  with  the  log  file  produced  by  this  aergesort  execution: 

Sorting  the  list: 

(6  7  4  1  2  8  5  3  16  12  9  11  IS  13  10  14  32  22  30  21  28  19  26  18  24  31 
22  29  20  29  2S  17  64  63  62  61  60  59  34  33  S8  57  56  55  54  53  52  51  50 
49  48  47  46  45  44  43  42  41  40  39  38  37  36  35) 

Starting  1  processor  sort  at  9803527 

Finished  1  processor  sort  at  151163188.  That  took  1413.5966  as 
Starting  4  processor  sort  at  157430828 

Finished  4  processor  sort  at  248600531.  That  took  911.697  as 
Starting  9  processor  sort  at  254848384 

Finished  9  processor  sort  at  330631571.  That  took- 757.83185  as 
Starting  16  processor  sort  at  337017977 

Finished  16  processor  sort  at  401035492.  That  took  640.1752  as 
Starting  25  processor  sort  at  407972369 

Finished  25  processor  sort  at  461663705.  That  took  536.9133  as 
Starting  36  processor  sort  at  468137724 

Finished  36  processor  sort  at  619548649.  That  took  514.10925  as 
Sorting  the  list: 

(6  7  4  1  2  8  5  3  16  12  9  11  16  13  10  14  32  22  30  21  28  19  26  18  24  31 
22  29  20  29  25  17) 

Starting  1  processor  sort  at  526138721 

Finished  1  processor  sort  at  606424159.  That  took  802.8544  as 
Starting  4  processor  sort  at  613038165 

Finished  4  processor  sort  at  673646208.  That  took  606.07043  as 
Starting  9  processor  sort  at  680223869 

Finished  9  processor  sort  at  726796432.  That  took  465.72562  as 
Starting  16  processor  sort  at  733697221 

Finished  16  processor  sort  at  776848166.  That  took  431.50943  as 
Starting  25  processor  sort  at  783606683 

Finished  25  processor  sort  at  830669664.  That  took  470.64078  as 
Sorting  the  list: 

(6  7  4  1  2  8  5  3  16  12  9  11  15  13  10  14) 

Starting  1  processor  sort  at  837629049 

Finished  1  processor  sort  at  883646903 .  That  took  460 . 17856  as 
Starting  4  processor  sort  at  890496880 

Finished  4  processor  sort  at  929338867.  That  took  388.41986  as 
Starting  9  processor  sort  at  936242285 

Finished  9  processor  sort  at  971092553.  That  took  348.5027  as 
Starting  16  processor  sort  at  978109126 

Finished  16  proceesor  sort  at  1012524715.  That  took  344.15588  as 
Sorting  the  list: 

(6  7  4  1  2  8  5  3) 

Starting  1  processor  sort  at  1019622193 
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Finished  1  processor  sort  st  1046974696. 
Starting  4  processor  sort  at  1064797480 
Finished  4  processor  sort  at  1094619041. 
Starting  9  processor  sort  at  1101662612 
Finished  9  processor  sort  at  1126786372. 
Sorting  the  list: 

(6741) 

Starting  1  processor  sort  at  1132929674 
Finished  1  processor  sort  at  1145004341. 
Starting  4  processor  sort  at  1162132853 
Finished  4  processor  sort  at  1166264569. 
Sorting  the  list: 

(6  7) 

Starting  1  processor  sort  at  1173665420 
Finished  1  processor  sort  at  1176647734. 


That  took  273.52602  ns 
That  took  397.2176  ns 
That  took  242.0376  as 

That  took  120.746666  ns 
That  took  141.31706  ns 

That  took  30.82314  as 
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Appendix  B 

Implementing  the  CAOS 
Framework 


This  appendix  ia  a  guide  to  the  source  files  which  implement  the  caos  system.  The  descriptions 
which  follow  are  at  a  much  greater  level  of  detail  than  those  in  Chapter  5,  and  are  intended  primarily 
for  readers  of  the  source  code,  as  a  supplement  to  the  embedded  documentation.  It  is  assumed  that 
readers  of  this  appendix  have  a  familiarity  with  Lisp  (principally  ZETALISP  or  CommonLisp),  and 
have  read  Chapter  5. 


B.l  General  Programming  Issues 

All  data  structures  are  implemented  with  the  dsf struct  mechanism,  dsf  struct  accepts  a  descrip¬ 
tion  of  the  desired  data  structure,  and  produces  a  number  of  macro  definitions  which  serve  to  create 
new  instances  of  the  structure,  and  access  and  modify  fields  of  the  structure.  For  example,  a  ship 
data  structure  may  be  defined  as  having  fields  nasis,  position,  and  couxss.  New  instances  of  ship’s 
are  created  by  calling  Baks-ship;  the  fields  of  the  ship  structure  are  accessed  by  calling  ship-naao, 
ship-position,  and  ship-conrss.  A  field  may  be  modified  by  embedding  a  field  access  function 
in  a  sstf  expression. 

The  CAOS  system  is  intended  for  use  in  ZETALlSP-compatible  environments.  The  system  was 
developed  originally  on  the  Symbolics  3600  family  of  workstations,  and  was  later  ported  to  the 
Texas  Instruments  Explorer  workstation.  These  machines  each  support  a  ZETALISP  programming 
environment,  but  are  not  completely  source-code  compatible. 

Source-level  incompatibilities  are  handled  by  use  of  the  and  #-  reader  macros.  An  occur¬ 
rence  of  #*Symbolica  in  a  source  file  causes  the  next  s-expression.  to  be  read  only  when  the  file  is 
being  loaded  into  a  Symbolics  workstation;  an  occurrence  of  S-Symbolics  prevents  the  following 
s-expression  from  being  loaded  into  a  Symbolics  workstation.  Similar  read-time  conditionals  for  the 
TI  environment  are  introduced  by  #*TI  and  #-TI  constructs. 


B.2  Interface  to  CARE 


In  order  to  function  properly  under  the  care  simulator,  all  Caos  code  and  caos  applications  must 
be  loaded  into  the  care-aaer  symbol  package.  This  package  is  defined  to  inherit  from  Care  those 
symbols  (e.y.,  functions,  variables,  and  macros)  which  comprise  the  exported  CARE  programming 
interface. 

B.2.1  CARE  Data  Structures 

The  following  CARE-defined  data  structures  are  used  caos: 

reaots-addrsss  [Structure] 

A  raaot  e-address  is  the  global  encapsulation  for  the  address  of  a  data  structure  located 
on  a  particular  processor.  It  may  be  thought  of  as  extending  the  address  space  of  a  site 
with  additional  address  bits  that  identify  the  site  in  the  processor  grid, 
reaote- address’s  contain  two  fields:  site  and  local.  The  site  field  identifies  the  site 
on  which  the  structure  pointed  to  by  the  local  field  resides. 

site  [Slrecfure] 

A  site  represents  one  of  the  processing  nodes  in  the  grid.  An  instance  of  a  site 
structure  is  actually  an  instance  of  a  site  flavor,  and  hence,  fields  of  a  site  are  accessed 
by  sending  Flavors  messages.  The  following  are  messages  relevant  to  caos:  : location, 
which  returns  the  (z,  y)  coordinate  of  the  site  in  the  grid;  :x-site,  which  returns  the  z 
coordinate  of  the  site;  and  :y-site,  which  returns  the  y  coordinate. 

queue  [Structure] 

A  qneas  implements  FIFO  storage,  and  is  used  in  a  number  of  places  within  CARE. 

In  particular,  packets  arriving  on  a  CARE  stroan  are  stored  in  a  queue.  The  queue 
structure  has  the  following  relevant  fields:  length,  body,  tail.  The  length  field  stores 
the  number  of  entries  which  are  currently  in  the  queue;  the  body  field  points  to  a  list 
which  implements  storage  for  the  queue;  the  tail  field  points  to  the  last  element  of  the 
body  of  the  queue,  and  allows  new  entries  to  be  appended  to  the  end  of  queue  in  0(1) 
time  (Access  to  the  head  of  the  queue  also  requires  0(1)  time). 

stream  [Structure] 

A  stream  is  a  virtual  circuit  which  carries  data  (in  the  form  of  packets)  between  processes. 
Operations  on  streams  are  performed  by  the  functions  post-packet  and  accept-packet, 
which  are  described  below  The  packets  field  of  a  stream  contains  the  queue  of  packets 
which  have  arrived  on  the  stream.  The  properties  field  of  a  stream  contains  an  arbitrary 
property  list;  caos  uses  the  property  list  to  store  information  to  help  the  function  which 
prints  out  streams  in  a  human-readable  fashion.  Other  fields  of  the  stream  are  not 
relevant  to  caos. 
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process 


[Structure] 


A  procaaa  is  the  basic  unit  of  computation  in  CARE.  The  innards  of  a  process  are 
of  no  concern  to  CAOS;  however,  it  should  be  noted  that  the  special  variable  •••care- 
procaaass*  is  always  bound  to  the  procaaa  structure  of  the  process  currently  executing. 

B.2.2  CARE  Functions  and  Macros 

The  following  functions  and  macros  are  used  by  caos: 

post-packet  koptional  form  kkey  ...  [Macro] 

The  macro  post -packet  is  used  to  create  new  streams  and  new  processes,  and  to  ex¬ 
change  messages  between  processes.  If  called  with  no  arguments,  it  returns  a  new  streaai 
instance.  All  other  post-packet  options  are  controlled  by  the  existence  of  various  key¬ 
words  in  its  argument  list.  When  keyword  arguments  are  supplied,  the  first  argument 
to  post-packet  is  evaluated  to  form  the  message  to  be  sent. 

The  following  keyword  options  are  employed  by  CAOS: 

to :  The  value  of  the  to  keyword  is  a  stream  or  list  of  streams  to  which  the  message  will 
be  sent. 

for:  The  value  of  the  for  keyword  is  a  stream  or  list  of  streams.  When  the  message  is 
received  remotely,  the  value  of  this  keyword  will  appear  in  the  clients  field  of  the 
message. 

for-new-streaa,  process:  These  two  keywords  ad  ways  appear  together  in  an  argu¬ 
ment  list,  and  take  no  arguments.  They  are  included  in  a  call  to  post-packet 
to  create  new  processes.  The  first  argument  in  such  a  call  is  a  form  to  evaluate 
remotely  to  start  the  process.  This  call  also  requires  a  to  keyword  argument,  which 
must  be  a  rsaot e-addross;  the  process  is  created  on  the  site  indicated  by  the  site 
field. 

The  value  of  the  call  is  a  stress.  A  call  to  accept-packet  on  this  stream  will 
return  a  packet  whose  value  field  is  the  default  stream  supplied  to  the  newly- 
created  process. 

after:  The  value  of  the  after  keyword  is  a  time  interval,  in  microseconds.  When  this 
keyword  is  supplied,  the  message  will  be  delivered  after  a  corresponding  delay.  The 
purpose  of  the  keyword  is  to  provide  for  a  means  of  implementing  timeouts.  A 
process  can  cause  a  packet  to  be  posted  to  a  stream  only  after  a  specified  interval; 
when  this  packet  arrives,  any  processes  waiting  on  the  stream  will  be  awakened. 

CAOS  implements  “clocked  futures”  using  this  mechanism. 

tagged:  The  tagged  keyword  provides  a  means  of  tagging  the  message  with  a  user- 
supplied  value;  its  principal  use  is  in  debugging  and  message  tracing. 

vith-packet-bindings  stream-form  bindings  tbody  forms  [Afacro] 


The  vith-packat-biadings  macro  evaluates  stream-form,  which  must  return  a  streajs. 

It  then  picks  the  first  packet  from  the  stream  (or  blocks  the  calling  process  until  a  packet 
arrives),  and  (lambda)  binds  portions  of  the  packet  to  the  variables  specified  in  bindings. 

The  format  of  bindings  is  a  list.  The  first  variable  name  in  the  list  is  bound  to  the 
contents  of  the  message;  the  second  is  bound  to  the  clients  of  the  message  ( e  g .,  the 
streams  specified  by  the  for  keyword  in  the  call  to  post-packet).  Additional  variables 
may  be  bound  to  fields  which  are  not  relevant  in  the  discussion  of  CAOS. 

accept-packet  stream  [Function] 

The  macro  with-packet-bindiags  is  defined  in  terms  of  this  function,  accept-packet 
is  called  with  stream  bound  to  a  streaa,  and  returns  the  first  packet  waiting  in  the 
stream  (or  blocks  the  calling  process  until  a  packet  is  available). 

dafprocass  [Macro] 

The  dafproeaas  macro  is  syntactic  sugar  for  defun.  Any  function  which  is  to  be  the 
top-level  of  a  CARE-process  should  be  defined  using  defproceas.  The  last  argument 
in  the  argument  list  of  a  function  defined  by  defproceas  will  be  bound  to  the  default 
stream  for  the  process;  thus,  any  function  defined  with  defprocess  must  have  at  least 
one  argument. 

B.3  The  CAOS  Support  Environment 

In  Chapter  5,  we  described  an  extension  to  Flavors  which  implements  abstract  data  type  support  for 
instance  variables.  The  files  herbs. lisp,  sage. lisp,  datatype . lisp,  and  priority-queue. lisp 
comprise  the  framework  which  includes  abstract  data  type  support.  In  addition,  these  files  contain 
code  which  implements  a  sort  of  inheritance  of  default  values  of  instance  variables,  and  code  which 
implements  substructure  for  instance  variables. 

B.3.1  Herbs. Lisp 

This  file  implements  a  form  of  inheritance  of  list-structured  default  values  of  instance  variables.  The 
Flavors  class  hierarchy  forms  a  taxonomy;  classes  defined  far  from  the  root  of  the  taxonomy  are 
more  specialised  than  those  defined  near  the  root.  Within  a  class,  methods  can  be  combined  with 
methods  of  the  same  name  in  ancestral  classes  in  quite  a  few  ways.  Unfortunately,  Flavors  provides 
no  means  of  combining  inherited  values. 

Consider  the  example  of  Figure  B.l.  The  Flavor  class  flavor-3  is  defined  as  a  subclass  of  classes 
flavor-1  and  flavor-2.  Both  flavor-1  and  flavor-2  define  an  instance  variable  called  iv-a. 
What  value  does  flavor-3  inherit  as  the  default  for  iv-a? 

In  normal  Flavors,  flavor-3  would  inherit  ’(a  b  c)  as  the  default  value.  However,  there  are 
situations  in  which  the  proper  value  to  inherit  for  iv-a  might  be  ’(a  b  c  d  a  f).  The  dafharb 
macro,  defined  in  herbs. lisp,  enables  this  sort  of  inheritance. 

Figure  B.2  illustrates  three  possible  inheritance  modes  for  the  default  value  of  iv-a  in  flavor-3. 
In  the  first  example,  the  default  value  of  iv-a  will  be  '(a  b  c  d  •  f).  In  the  second  example,  its 
value  will  be  ’ (a  b  c  d  a  f  g  h  i).  In  the  final  example,  its  value  will  be  ’(b  d  f). 


(dafflavor  flavor-1  ((iv-a  ’(a  b  c)))  ()) 


(daf flavor  flavor-2  ((iv-a  '(daf)))  ()) 

(dafflavor  flavor-3  ()  (flavor-1  flavor-2)) 

Figure  B.l:  Multiple  inheritance  example. 

(daf barb  flavor-3  ((iv-a  ♦  nil))  ()) 

(daf barb  flavor-3  ((iv-a  *  ’(gbi)))  ()) 

(daf barb  flavor-3  ((iv-a  -  '(a  c  a)))  ()) 

Figure  B.2:  daf  barb  examples. 


B.3.2  Sage. Lisp 

This  file  implements  structured  and  abstract  data  type  support  for  instance  variables.  Both  facilities 
depend  on  storing  special- purpose  structures,  known  as  vp-slot’s,  in  instance  variables.  Descrip¬ 
tions  of  the  vp-slot  structure,  and  the  important  functions  which  access  it,  follow  (many  of  the 
concepts  used  here  come  from  the  Strobe  system  [13]): 

vp-slot  [Structure] 

A  vp-slot  contains  three  primary  fields.  The  value  field  holds  the  “value”  of  the  slot. 

The  datatype  field  holds  an  indication  of  what  sort  of  objects  will  reside  in  the  value 
field  of  the  slot.  Finally,  the  usar-daf  inad-facats  field  holds  an  association  list  of 
arbitrary  facet  names  and  values;  new  facets  may  be  added  at  any  time. 

A  vp-slot  may  be  thought  of  as  a  value  with  arbitrary  annotations  (All  slots  are  an¬ 
notated  with  a  datatype  facet).  These  annotations  might  permit  a  program  to  reason 
about  the  contents  of  the  slot  when  necessary. 

gatfacat  object  slot  ^optional  (facet  ’value)  errorflg  novalveflg  [Function] 

The  function  gatfacat  returns  the  value  of  facet  in  slot  of  object.  Facet  defaults  to 
value,  which  retrieves  the  value  field  of  the  vp-slot.  Other  acceptable  bindings  for 
facet  are  datatype,  plus  any  facet  in  the  usar-daf  inad-facats  field  of  the  slot.  If  the 
facet  doesn't  exist,  and  the  value  of  errorflg  is  non-nil,  a  fatal  error  will  occur.  If  the 
value  of  the  facet  is  «novalua*,  and  novalveflg  is  nil,  the  value  returned  from  gatfacat 
will  be  nil;  otherwise,  it  will  be  the  value  found  in  the  facet. 

putfacat  object  slot  koptional  (facet  ’value)  (valve  '*novalua*)  errorflg  [Function] 
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The  function  putfacst  puts  value  into  facet  of  slot  of  object.  If  the  facet  doesn’t  exist, 
it  is  first  created.  If  the  slot  doesn  t  exist  (e.g..  the  instance  variable  named  slot  doesn’t 
exist,  or  doesn’t  contain  an  object  of  type  vp-slot)  and  errorflg  is  non-nil,  a  fatal  error 
is  signalled. 

#_  [deader  Macro] 

Unfortunately,  by  placing  vp-slot  structures  in  instance  variables  of  Flavor  instances,  it 
becomes  impossible  to  simply  get  the  “value”  of  the  instance  variable  (since  the  value  is 
now  a  vp-slot).  The  #_  reader  macro  is  a  piece  of  syntactic  sugar  which  expands  to  the 
form  (vp-slot-valas  . . . ),  and  hence,  retrieves  the  valus  field  of  the  slot.  Therefore, 
references  to  instance  variables  which  contain  slots  can  be  preceded  by  #_to  retrieve  the 
actual  value  of  the  slot. 

A  number  of  macros  are  defined  in  terms  of  these  basic  functions;  their  function  should  be  clear 
from  examination  of  the  source  code. 

Abstract  Data  Type  Support 

Abstract  data  type  support  for  instance  variables  is  implemented  by  forwarding  messages  sent  to 
vp-slot’s  to  the  objects  pointed  to  by  their  datatype  fields.  Consider  the  example  in  Figure  B.3. 
The  inclusion  of  the  :gsttabls-instanes-vaxiablss  option  in  the  definition  of  flavor- 1  causes 
in«t*nf«w  of  flavor-1  to  repond  to  :iv-a  messages  (note  the  in  the  message  name);  instances 
of  flavor- 1  do  not  respond  to  the  iv-a  message. 

Normally,  when  a  message  for  which  no  method  is  defined  is  sent,  an  error  occurs;  however,  it  is 
possible  to  define  an  :unelain«d-a«thod  method  for  a  Flavors  class.  The  : nnclaiaad-aathod  is 
invoked  when  an  undefined  message  is  sent.  The  file  sags. lisp  defines  a  Flavors  class,  sags-class, 
which  has  just  this  sort  of  .-nnclsiasd-asthod. 

When  an  undefined  message  is  sent  to  a  Flavors  instance  which  has  sags-class  as  an  ancestor, 
the  following  steps  are  taken: 

1.  If  the  message  is  actually  the  name  of  an  instance  variable  in  the  instance,  the  message  name 
is  evaluated  (using  syaeval-ia-instance)  to  retrieve  the  value  of  the  variable. 

2.  If  the  value  of  the  variable  is  a  structure  of  type  vp-slot,  a  message  is  sent  to  the  Flavors 
instance  stored  in  the  datatype  field  of  the  slot.  The  message  name  is  taken  from  the  first 
“argument”  of  the  unclaimed  message.  The  arguments  in  the  message  are  the  Flavors  instance 
to  which  the  message  was  originally  sent,  the  name  of  the  instance  variable  to  which  the 
message  was  sent,  and  all  but  the  first  of  the  original  arguments  of  the  unclaimed  message. 

Now  consider  the  course  of  events  when  (ssnd  instancs-1  ’iv-a  :gst  ’b)  is  evaluated: 

1.  The  message  iv-a  is  received  by  instancs-1. 

2.  instancs-1  does  not  handle  the  message  iv-a,  so  the  message  is  forwarded  to  the 
: unclainsd-ssthod  method  defined  by  sags-class. 
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(def flavor  aasociation-list  ()  ()) 

(defaethod  (assocation-list  :get)  (instance  iv  key) 

(cdr  (assq  key  (getvalue  instance  iv)))) 

(defvar  assn-instance  (aske-instance  'association-list)) 

(daf flavor  flavor- 1 

((iv-a  (aake-vp-slot  valua  '((a  .  1)  (b  .  2)  (c  .  3)) 

datatypa  assn- instance) ) ) 

(aaga-claaa) 

: gattabla- iaatanca-var iablaa ) 

(dafvar  iaatanca-1  (sake-instance  'flavor-l)) 


Figure  B.3:  A  Flavor  containing  a  slot 


3.  The  :unclaiasd-aetliodcode  evaluates  iv-a  in  the  context  of  inataaca-1,  and  discovers  the 
value  to  be  a  structure  of  type  vp-slot.  It  then  effectively  evaluates  the  following:  (sand 
aasn-inatanca  :gat  inataaca-l  'iv-a  'b). 

4.  The  :gat  method  of  associatioa-list  is  called.  It  uses  its  first  two  arguments  to  retrieve 
the  association  list  from  the  valua  field  of  the  vp-slot  to  which  the  message  was  originally 
directed.  It  then  uses  its  third  argument  to  return  the  value  of  an  association  from  the  list. 

5.  The  value  returned  by  the  :gat  method  of  the  vp-slot ’s  datatype  is  returned  as  the  value  of 
the  original  message. 


A  number  of  macros  are  defined  for  the  convenience  of  programmers: 


dafdatatypa  [Macro] 

Defines  a  new  Flavors  class  suitable  for  use  as  an  abstract  data  type.  This  is  syntactic 
sugar  for  a  combining  dsfflavor  and  dsfaethod  into  one  textual  unit.  For  example, 
the  above  definition  of  association-list  could  have  been  made  by  evaluating: 

(dsfdatatyps  association-list  "Iaplsasnts  a-list  dictionaries . " 

( :g«t  (instance  iv  key) 

(cdr  (assq  key  (getvalue  instance  iv))))) 


St 


[Reader  Macro] 


This  reader  macro  accepts  the  name  of  a  datatype  class,  and  returns  an  instance  of  the 
class.  If  no  instances  of  the  class  have  been  created,  it  creates  one  and  stores  it  in  a  hash 
table  (*sag«— datatype -hash-table*).  This  reader  macro  is  used  in  creating  slots: 


(daf flavor  flavor- 1 

(Civ-a  (nake-vp-slot  value  '{(a  .  1)  (b  .  2)  (c  .  3)) 

datatype  fttassociation-list))) 


()) 


B.3.3  Datatype. Lisp  and  Priority-Queue.Lisp 

These  files  use  the  facilities  defined  by  sage. lisp  and  herbs. lisp  to  define  a  number  of  useful 
abstract  data  types.  In  general,  these  ADT’s  respond  to  an  :  initialize  message  to  initial  ire 
themselves  to  an  “empty”  state,  a  :  put  message  to  add  items  to  themselves,  and  a  :get  message 
to  remove  items  from  themselves. 

queue  [Abstract  Data  Type] 

The  queue  data  type  implements  FIFO  storage  in  an  instance  variable.  The  current 
implementation  uses  lists  maintained  by  the  tcoac  function,  defined  in  datatype .  lisp. 

The  :  initialize  message  empties  the  queue,  the  :put  message  enqueues  entry  on  the 
end  of  the  queue,  and  the  :get  message  dequeues  an  entry  from  the  front  of  the  queue. 

If  the  instance  variable  in  which  the  queue  resides  has  a  nax-lsngth  facet,  entries  are  • 
added  to  the  queue  if-and-only-if  the  current  length  of  the  queue  is  less  than  the  specified 
maximum  length. 

Two  values  are  returned  by  a  :put  message.  The  first  value  is  t  if  there  was  room 
to  append  the  new  entry;  the  second  value  is  the  value  appended  to  the  queue.  Two 
values  are  also  returned  by  the  :get  message.  The  first  is  the  value  found  at  the  head 
of  the  queue;  the  second  is  nil  if  the  queue  was  empty  before  the  message,  or  t  if  it  was 
non-empty. 

All  operations  defined  for  a  queue  require  0(1)  time. 

dictionary  [AJslrsct  Data  Type] 

The  dictionary  is  a  fuller  version  of  the  association-list  ADT  described  above. 

The  :put  and  :get  operations  require  0(n)  time,  and  hence,  suggest  the  dictionary 
datatype  be  used  when  the  number  of  entries  is  expected  to  be  small.  In  addition  to 
:  initialize,  :put,  and  :get  messages,  the  dictionary  also  responds  to  the  following 
messages: 

:add  key  value  ( Datatype  Message] 

Adds  value  as  an  additional  value  to  be  associated  with  key.  A  :get  message  on  key  will 
subsequently  return  lists  of  two  or  more  values.  Requires  O(n)  time. 

:  forget  key  [Datatype  Message] 
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Removes  the  entry  associated  with  key  from  the  dictionary.  Requires  O(n)  time. 


:nap  function  [Datatype  Message] 

Applies  function  to  each  entry  in  the  dictionary,  Function  must  be  a  function  of  two 
arguments;  the  first  argument  will  receive  the  key  of  an  entry,  and  the  second  will  receive 
the  value  of  the  key.  Requires  O(n)  time. 

:nev-id  [Datatype  Message ] 

Returns  a  key  which  is  guaranteed  not  to  be  in  the  dictionary.  This  is  currently  imple¬ 
mented  using  gensyn,  and  as  such,  requires  0(1)  time. 

:nuaber-of -entries  [Datatype  Message] 

Returns  the  number  of  entries  in  the  dictionary.  Requires  0(1)  time. 

:  all-entries  [Datatype  Message] 

Returns  all  of  the  entries  in  the  dictionary,  in  association-list  format.  Requires  0(1) 
time. 

sort ed-dict ionary  [Aijtroct  Data  Type] 

The  sorted-dictionary  is  a  variant  of  the  dictionary  which  keeps  its  entries  in  sorted 
order,  as  defined  by  a  user-supplied  comparison  function.  It  responds  to  the  same  mes¬ 
sages  as  does  the  dictionary.  The  time  complexity  of  operations  defined  for  a  sorted- 
dictionary  are  equivalent  to  those  defined  for  a  dictionary. 

The  comparison  function  must  be  a  predicate  of  two.  arguments,  and  must  return  t  if- 
and-only-if  the  first  argument  is  '‘greater”  than  the  second  argument.  For  example,  if 
the  keys  represent  timestamps,  and  the  dictionary  is  to  keep  the  keys  sorted  in  ascending 
order,  the  comparison  function  can  be  specified  as  •'<,  the  lsssp  function. 

In  addition  to  the  messages  defined  by  the  dictionary  data  type,  the  sorted- 
dictionary  also  responds  to  these  messages: 

:  greatest-entry  [Datatype  Message] 

The  :  greatest-entry  message  returns  the  key  having  the  “greatest”  value,  as  defined 
by  the  comparison  function.  Because  the  dictionary  is  kept  in  sorted  order,  this  operation 
requires  only  0(1)  time. 

: next-entry  n  [Datatype  Message ] 

The  :  next-entry  message  returns  the  key  of  the  entry  having  the  next  “greatest”  value 
to  that  of  n.  This  is  an  0(n)  operation. 

hash-dictionary  [Abstract  Data  Type] 
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The  hash-dictionary  is  a  dictionary  implementation  which  is  based  on  hash  tables, 
rather  than  association  lists.  It  responds  to  the  same  messages  as  does  the  dictionary 
ADT.  Its  advantage  over  the  dictionary  is  that  insertion,  lookup,  and  deletion  opera¬ 
tions  are  all  of  0(1)  time  complexity;  however,  the  enumeration  message,  :  all-entries, 
is  of  0(n)  time  complexity. 

aonitor  [Abstract  Data  Type] 

The  aonitor  data  type  is  a  special  purpose  ADT  which  aids  in  the  implementation  of 
lexically-scoped  mutual  exclusion.  Storage  for  the  monitor  is  implemented  by  a  aonitor 
structure: 

aonitor  [5<mct«re] 

The  monitor  structure  contains  two  fields:  owner,  which  points  to  the  process  which 
currently  owns  the  monitor;  and  *ai  ting-procss  ass,  which  is  a  queue  of  processes 
waiting  to  obtain  ownership  of  the  monitor. 

renter  wakevp-stream  [Datatype  Message] 

A  process  wishing  to  enter  a  region  of  mutual  exclusion  sends  this  message.  If  the 
monitor  is  unowned,  the  owner  is  set  to  the  value  of  •••care-process***,  and  the 
caller  is  allowed  to  enter  the  region  of  mutual  exclusion. 

If  the  monitor  is  currently  owned,  a  dotted  pair,  consisting  of  the  value  of  •••care- 
process***  and  wakenp-stream,  is  enqueued  on  the  waiting-processes  queue  of  the 
monitor.  The  caller  then  calls  accspt-packst  in  order  to  suspend  execution.  When  the 
caller’s  request  reached  the  head  of  the  queue,  a  packet  will  be  sent  to  wakenp-stream, 
restarting  the  suspended  caller. 

:czit  [Datatype  Message] 

The  :  exit  message  relinquishes  ownership  of  the  monitor,  and  restarts  the  next  process 
waiting  to  obtain  it  (if  any). 

If  the  waiting-processes  queue  is  non-empty,  the  first  entry  on  the  queue  is  dequeued. 

The  entry  contains  the  process  handle  of  the  waiting  process,  which  is  placed  in  the 
owner  field  of  the  monitor,  and  the  stress  upon  which  to  send  the  wakeup  message. 

If  the  queue  is  empty,  the  owner  field  of  the  monitor  is  set  to  nil,  so  that  the  monitor 
is  marked  as  unowned. 

with-SMnitor  monitor-name  ft  body  forms  [Macro] 

This  macro  implements  an  error-protected,  lexically-scoped  mutual  exclusion.  Monitor- 
name  must  be  the  name  of  an  instance  variable  in  the  Flavors  instance  currently  bound 
to  self  which  holds  a  aonitor.  Upon  entry  to  this  macro,  an  .enter  message  is  sent  to 
the  monitor  to  gain  entrance.  The  expressions  in  forms  are  then  executed  under  unwind- 
protect  protection,  such  that  if  an  error  occurs  during  their  execution,  the  monitor  is 
guaranteed  to  be  released. 

This  macro  is  equivalent  to  the  with. aonitor  macro  of  Interlisp-D. 


■ithout-aonitor  monttor-name  fcbody  forma 


[Macro] 


This  macro  is  intended  to  be  used  within  the  scope  of  a  vith-aonitor  form.  Its  purpose 
is  to  temporarily  release  ownership  of  the  monitor  specified  by  monttor-name  (using  the 
:*xic  method),  and  then  to  reobtain  it  (using  the  :  enter  method)  after  the  forms  in 
forms  have  been  executed.  Typically,  forms  will  contain  an  expression  that  causes  the 
calling  process  to  suspend  for  some  period  of  time  (or  until  a  packet  arrives  on  some 
stream). 

This  macro  is  similar  in  spirit  to  the  monitor .  await .  event  macro  of  Interlisp-D 

priority-queue  [Abstract  Data  Type ] 

The  priority-queue  data  type  and  the  code  needed  to  implement  it  are  contained  on 
the  file  priority-queue. lisp.  The  build  of  this  file  is  a  set  of  ZetaLisp  routines  which 
implement  a  dynamic,  Heap  sort-sty  \e  priority  queue.  The  implementation  is  derived 
from  algorithms  OELETCM1N  and  INSERT,  from  section  4.11  of  [1].  Insertion  and  deletion 
from  this  queue  both  require  0(n  log  n)  time. 

priority-queue  [Structure] 

The  priority-queue  structure  implements  storage  at  the  nodes  of  the  partially-ordered 
binary  tree.  It  has  fields  left-child,  right-child,  and  item.  In  addition,  for  conve¬ 
nience,  it  has  a  priority-function  field  which  stores  the  priority-computing  function 
for  entries  in  the  tree. 

exchange-nodes  top  bottom  [Macro] 

This  macro  exchanges  the  contents  of  nodes  top  and  bottom. 

insert-in-queue  qnene  node  [/’unction] 

This  function  inserts  node,  an  instance  of  a  priority-queue  structure,  into  the  tree 
rooted  by  qnene.  It  recursively  descends  into  the  tree,  heading  for  the  leftmost  free  node 
at  the  lowest  level  of  the  tree  (creating  a  new  level  if  necessary).  As  it  unwinds  from 
the  recursion,  it  exchanges  nodes  as  necessary  to  maintain  the  partial  order.  The  value 
returned  from  this  function  is  the  new  root  of  the  tree,  which  may  have  changed. 

rebalance-queue  qnene  [/’unction] 


This  function  rebalances  the  tree  rooted  at  qnene  after  its  root  has  been  removed, 
reaove-froa-queue  qnene  [/'unctionj 


This  function  removes  the  item  from  the  partially-ordered  tree  rooted  at  qnene,  and 
rebalances  the  tree  to  maintain  the  partially-ordered  invariant.  It  returns  two  values: 
the  value  found  at  the  root,  and  a  pointer  to  the  new  root  of  the  tree. 
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*ortingsp*c  ::=  {key-spec  .  sorting-spec )  |  nil 
key-spec  ::=  (key-name  .  field-spec- list) 
field-spec-list  ::=  (field-spec  .  field-spec-list)  |  ail 
field-spec  ::=  ( field-computation  .  predicate) 
field-computation  ::=  Seld-arg  |  (field-op  .  Seld-arg-list) 

Beld-arg-list  ::=  (Seld-arg  .  Seld-arg-list)  |  ail 
Seld-op  ::=  any- lisp- function 
key-name  ::=  any-lisp-synbol 

Seld-arg  Seld-number  |  '  any-valuwd-lisp-synbol 
field-number  ::=  any-lisp- integer 
predicate  ::=  any-lisp-predicate 

Figure  B.4:  BNF  Grammar  for  declaring  sorting  functions. 

((:sit«  ((♦  (•  0  *1«)  1)  .  <)) 

(:agea«  (2  .  alphalessp)) 

(:task  (3  .  <))) 


Figure  B.3:  A  sample  sorting  specification. 

B.4  Instrumentation  for  CAOS 

The  care  system  comes  supplied  with  a  wide  variety  of  “instrument  panels”  which  report  how 
various  components  of  the  simulated  execution  architecture  are  being  utilised.  Much  of  caos  was 
defined  prior  to  the  existence  of  these  instruments,  and  the  file  pravda.lisp  contains  vestigial 
remnants  of  an  interim  CAOS-baaed  instrumentation  package.  This  package  is  no  longer  in  use, 
and  it  will  not  be  documented  here,  although  it  is  part  of  the  caos  sources.  There  are,  however, 
CAOS- specific  instrument  panels  which  are  still  in  use.  These  panels  are  documented  in  this  section. 

B.4.1  Scrolling- Text-Panel. Lisp 

The  file  a crolling-t art-panel .  lisp  contains  an  instrument  which  displays  information  in  a  sorted 
order  in  a  ZETAUSP-defined  window  known  as  a  tv: scroll-window.  Such  windows  are  designed 
to  display  a  structured  representation  of  data;  new  lines  of  information  may  be  added  or  deleted 
dynamically,  and  the  window  may  be  scrolled  vertically  if  more  information  is  being  displayed  than 
can  fit  in  the  window. 

The  scrolling-tsxt-pansl  is  a  tv: scroll-window  whose  sorting  order  and  display  formatting 
commands  are  specified  by  a  simple,  declarative  grammar.  The  declaration  of  the  sorting  function 
is  specified  in  the  :sorr*c-by  instance  variable  of  the  panel;  the  formatting  function  is  specified 
by  the  :printed-by  and  :fornatted-by  instance  variables.  We  first  describe  the  grammar  as  it 
pertains  to  sorting. 
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The  sorting  grammar  is  described  in  BNF  format  in  Figure  B.4;  1  an  example  from  caos  appears 
in  Figure  B.5.  Unquoted  numbers  used  in  Seld-aumber  positions  refer  to  corresponding  elements  of 
a  vector  in  which  information  which  drives  the  sorting  and  display  functions  resides. 

The  sorting  declaration  in  Figure  B.5  constructs  three  sorting  functions,  indexed  respectively  by 
the  keywords  :site,  :  agent,  and  :task.  The  :sita  sorting  function  is  compiled  into  the  following 
pieces  of  Lisp  code:3 

(defun  foo-site-sorter  (iten-1  itea-2) 

(let  ((entry-1  (array-leader  iten-1  (1+  tv: scroll- iten-leader-off set))) 

(entry-2  (array-leader  iten-2  (l*  tv:scroll-iten-leader-o«set)))) 

(<  (♦  (*  (nth  0  entry-1)  16)  (nth  1  entry-1)) 

(♦  (*  (nth  0  entry-2)  16)  (nth  1  entry-2))))) 

The  :  agent  sorting  function  is  a  refined  version  of  the  :site  sorting  function.  It  expands  into: 

(defun  foo-agent -sorter  (iten-1  iten-2) 

(let  ((entry-1  (array-leader  iten-1  (1+  tv: scroll- iten-leader-off set))) 

(entry-2  (array-leader  iten-2  (1+  tv:scroll-iten-leader-offset))) 

(key-2  (array-leader  iten-2  tv:acroll-iten-leader-offset))) 

(cond  ((foo-site-sorter  iten-1  itea-2)  t) 

((equal  iten-1  iten-2) 

(cond  ((nenq  key-2  ’(.-site))  nil) 

(t  (alphalessp  (nth  2  entry-1)  (nth  2  entry-2)))))))) 

The  :task  sorting  function  is  further  refined,  and  expands  to: 

(defun  foo-task-sorter  (iten-1  iten-2) 

(let  ((entry-1  (array-leader  iten-1  (!♦  tv:scroll-iten-leader-offset))) 

(entry-2  (array-leader  iten-2  (1*  tv:scroll-iten-leader-offset))) 

(key-2  (array-leader  itea-2  tv: scroll- iten-leader-off set))) 

(cond  ((foo-agent-sorter  iten-1  iten-2)  t) 

((equal  itea-1  itea-2) 

(cond  ((nenq  key-2  ’(:site  : agent))  nil) 

(t  (<  (nth  3  entry-1)  (nth  3  entry-2)))))))) 

We  now  discuss  the  language  with  which  formatting  functions  are  defined.  Lines  of  text  are 
output  to  scrolling-text-panels  with  the  function  foraat;  in  order  to  use  this  function,  we 
must  have  a  way  of  choosing  both  format  control  strings  and  the  expressions  which  are  evaluated 
to  generate  arguments  for  these  control  strings. 

1  In  this  figure,  and  in  Figure  B.6,  tokens  in  this  foot  are  non-terminals,  and  tokens  in'this  fast  are  terminals. 
Occurrences  of  are  Lisp  “censing  dots;"  thus,  where  the  grammar  would  ordinarily  demand  statements  of  the 
form  (a  .  (b  .  (e  .  all))),  it  is  acceptable  to  supply  the  form  (a  b  c). 

2 The  arguments  iten-1  and  iten-2  are  bound  to  instances  of  tv  scroll-lise-iten  structures.  The  inter¬ 
nal  representation  of  these  structures  is  unimportant,  escept  that  arbitrary  application-program  information  may 
be  stored  in  their  array  feeder  sections.  The  first  word  of  available  storage  in  the  array  leader  is  found  at 

tv  :s«rell-itm-leader-offset. 


print-spec  ::=  (.key-spec  .  print-spec )  |  ail 
key-spec  ::=  (key-name  .  field-spec-list) 
field-spec-list  ::=  (field-computation  .  Geld-spec-list )  |  nil 
field-computation  ::=  Reld-arg  |  (field-op  .  Reld-arg-list) 

Reld-arg-list  ::=  (Reld-arg  .  Reld-arg-list )  |  nil 
R eld-op  ::=  any- lisp- function 
key-name  ::=  any-lisp-symbol 

Reld-arg  ::=  Reid-number  ]  'any-valued-lisp-synbol 
Reld-number  ::=  any-lisp-integer 

Figure  B.6:  BNF  Grammar  for  declaring  printing  functions. 


(( :ait«  .  "SITE-'D-'D") 

( : agent  .  "  “A  "A  (*D  run,  "0  unit)") 
( :«ask  .  "  'A  *A  "A")) 

(( :ait«  0  1) 

(:agant  2  (car  3)  4  S) 

(:taak  4  3  6)) 


Figure  B.7:  A  sample  formatting  specification 


Format  control  strings  are  chosen  by  indexing  into  an  association  list  stored  in  the  forwattsd- 
by  instance  variable  of  the  panel.  Lisp  expressions  which  generate  the  arguments  for  format  are 
created  by  parsing  expressions  defined  by  the  grammar  in  Figure  B.6  and  are  found  in  the  print *d- 
by  instance  variable  of  the  panel.  The  contents  of  these  two  instance  variables,  in  an  example  from 
the  CAOS  instrumentation,  is  illustrated  by  Figure  B.7.  The  panel  defined  by  the  specifications 
in  Figures  B.5  and  B.7  will  display  sites  in  column-major  order;  within  each  site,  agents  will  be 
displayed  alphabetised  by  name;  within  each  agent,  tasks  will  be  displayed  ordered  by  arrival  time. 
For  example: 

SITE-1-1 

MEEGES0ET-IUIAGEE-1  IIXTXALXZED  (0  run,  0  wait) 

NEEGESOETEE- 1-1  IIXTXALXZED  (1  run,  3  wait) 

EOIIXIG  346700  IEXGIB0E 
■EVEE-EUI  346792  HEEGESORT 
SITE-1-2 

NEEGESOETEE- 1-2  INITIALIZED  (0  run,  0  wait) 

B.5  CAOS  Structures  and  Macros 

The  file  czardef ns .  lisp  contains  macro  and  structure  definitions  for  the  rest  of  the  caos  system. 
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rsqusst-asssags 


[Structure] 


The  rsqusst-asssags  structure  is  a  list  which  defines  the  contents  of  messages  sent 
using  the  various  post  operators  of  caos. 


responss-asssag* 


[Structure] 


The  responss-asssag*  structure  is  a  list  which  defines  the  contents  of  messages  sent 
as  responses  to  value-desired  messages. 


caos-tia* 


[Macro] 


This  macro  retrieves  the  current  simulator  time,  which  is  measured  in  simulator  clock 
units.  Presently,  this  figure  is  measured  in  10  nanosecond  units. 


runaabls-itsa 


[Structure] 


The  runnable- it  sa  is  the  CAOS  scheduler’s  handle  on  a  process.  Most  of  its  structure 
was  described  in  Section  5.4.  The  panel-entry  field  holds  the  tv :  scroll-uindou  line 
entry  of  the  process. 


contract 


[/Resource] 


Resources  are  Lisp  objects  which  must  be  explicitly  allocated  and  deallocated.  This  is 
counter  to  the  normal  Lisp  philosophy,  but  is  quite  useful  when  the  extent  of  an  object  is 
known.  The  advantage  of  declaring  objects  as  resources  is  that  large  numbers  of  unused 
copies  of  the  objects  aren’t  accumulated  to  be  reclaimed  only  when  the  garbage  collector 
is  run.  The  contract  resource  allocates  and  deallocates  runnabls-itaa’s. 


cars-aits-scrolling-pansl-sntry 


[Structure] 


This  structure  is  the  vector  which  holds  information  for  sorting  and  formatting  care- 
sits  entries  in  the  scrolliag-tsxt-panel.  In  figures  B.5  and  B.7,  this  structure  is 
referenced  by  printing  and  sorting  specifications  keyed  by  :sits.  The  fields  of  the 
structure  are: 


x,  y:  Coordinates  of  the  site  in  the  processor  grid, 
stats:  The  condition  of  the  site. 

agsnt-scrolling-tsxt-pansl-sntry 


[S/ruciure] 


This  structure  is  the  vector  which  holds  information  for  sorting  and  formatting  agent 
entries  in  the  scrolling-tsxt-pansl.  It  is  referenced  by  printing  *nd  sorting  specifi¬ 
cations  keyed  by  :  agent.  The  fields  of  this  structure  are: 

x,y:  Coordinates  of  the  site  upon  which  the  agent  is  located, 
naae:  The  name  of  the  agent. 


state:  The  condition  of  the  agent. 

nrnn :  The  number  of  runnable  tasks  in  the  agent. 

await :  The  number  of  suspended  tasks  in  the  agent. 

task-scrolling-panal-entry  [Slmctare] 

This  structure  is  the  vector  which  holds  the  information  for  sorting  and  formatting  task 
(process)  entries  in  the  scroll ing-text-paael.  This  structure  is  referenced  by  printing 
and  sorting  specifications  keyed  by  :  task.  The  fields  of  the  structure  are  as  follows: 

x,  j :  Coordinates  of  the  site  upon  which  the  task  is  executing, 
naae:  The  name  of  the  agent  in  which  the  task  is  executing, 
entry -tine:  The  simulator  time  at  which  the  task  started, 
stats:  The  current  state  of  the  task. 

■essage :  The  name  of  the  message  being  executed  by  the  task. 

future  [Structure] 

A  future  is  a  special  object  which  represents  a  promise  of  a  value  to  be  returned  by  a 
remote  computation.  It  has  the  following  fields: 

value:  When  the  future  has  a  value,  it  is  placed  in  this  field. 

suig-ld:  The  unique  id  of  the  message  which  associated  with  the  computation  which 
will  return  a  value  to  this  future. 

waiting-processes:  The  number  of  processes  waiting  for  the  future  to  have  a  value, 
waiting-process-list:  The  list  of  processes  waiting  for  the  future,  in  tconc  format. 

slagle-assignaeat:  A  boolean  field;  true  if  the  future  can  only  be  assigned  a  value 
once. 

origiaal-Mssage:  The  contents  of  the  request-aessage  message  sent  to  start  the 
remote  computation  which  will  return  a  value  to  this  future.  Used  when  a  clocked, 
single-assignment  future  is  reposted. 

destinations:  The  destination  agents  to  which  the  original  message  was  sent;  used  by 
repost. 

■ulti-future  [Sirwctsre] 

A  nnlti-futars  is  a  collection  of  futures.  It  is  returned  by  the  value-desired,  multipost¬ 
style  messages.  A  wulti-future  contains  a  lists  of  satisfied  and  unsatisfied  futures. 
Initially,  all  futures  in  a  nulti-future  are  unsatisfied;  as  values  of  remote  computations 
are  received,  unsatisfied  futures  are  given  values  and  moved  to  the  list  of  satisfied  futures. 


B.6  Declaring  CAOS  Agents 

Th«  file  czardecl.lisp  contains  routines  to  declare  sites  and  agents. 

defsite  [Afacro] 

This  macro  makes  it  possible  to  declare  Flavor  classes  which  implement  site-global  stor¬ 
age  within  caos.  defsite  is  defined  in  terms  of  defherb,  and  thus,  it  is  possible  to  define 
instance  variables  within  site  instances  which  support  abstract  data  type  operations. 

It  is  conceivable  that  if  caos  were  ever  implemented  on  a  heterogeneous  array  of  pro¬ 
cessors,  there  would  be  a  number  of  site  types,  perhaps  defined  in  a  taxonomy. 

vanilla-sits  [5ife] 

Instances  of  vanilla-sits  implement  site  global  storage.  Each  instance  has  the  follow¬ 
ing  instance  variables: 

static-agsnt-strsaa-tabls:  Contains  a  dictionary  which  maps  static  (named)  agents 
to  their  input  stream  addresses. 

unr'esolved-agent-streaa-table :  Contains  a  dictionary  which  maps  the  names  of 
remote  agents  not  yet  known  during  initialization  to  the  addresses  of  streams  in 
local  agents  which  have  requested  the  addresses  of  the  unknown  remote  agent. 

local-agents :  A  dictionary  which  maps  the  names  of  local  agents  to  their  addresses. 

free-process-queue :  A  queue  which  holds  information  allowing  free  processes  to  be 
reused  in  preference  to  creating  new  processes. 

care-site:  Holds  a  pointer  to  the  CARE  site  structure  for  the  site  upon  which  the 
care-site  is  located. 

locale:  Holds  a  CARE-defined  structure  which  is  created  by  sake-locale,  and  which  is 
updated  by  update-locale.  Each  call  to  update-locale  modifies  the  structure  so 
that  a  call  to  locale-site  returns  the  least-recently-referenced  site  in  the  locale. 

This  is  a  simple  approach  to  load-balancing. 

inconing-streaa:  Holds  the  stream  upon  which  the  site  manager  listens  for  site- 
oriented  requests. 

defag  ent-keyvord  [Macro] 

This  macro  defines  the  syntax  for  a  new  keyword  used  in  a  call  to  def agent  (see  below). 

The  keywords  described  in  Chapter  4,  plus  a  number  of  keywords  not  described,  are  all 
declared  through  the  use  of  def  agent-keyuord. 

def  agent  [Macro] 

The  def  agent  macro,  which  is  defined  in  terms  of  defherb,  is  the  basic  form  by  which 
new  agents  are  declared.  It  is  described  in  detail  in  Chapter  4. 


def agent -method 


[Afacro] 


The  defagent-nethod  macro  is  syntactic  sugar  for  defaethod,  but  has  the  advantage 
of  being  able  to  define  the  same  method  for  multiple  message  names. 

clock  [Abstract  Data  Type J 

The  clock  ADT  responds  to  the  :reaxu,  :tick,  end  :stop  messages.  The  value  field 
of  a  vp-slot  of  the  clock  datatype  holds  a  list  of  messages  to  be  executed  when  the 
clock  “fires.” 

vanilla-agent  [Agent] 

The  vanilla-agent  is  the  moat  basic  agent  in  the  system.  It  has  the  following  instance 
variables: 

local-process-stresa-table :  A  dictionary  which  maps  from  a  process  handle  to  a 
utility  stress  the  process  uses  to  wait  for  wakeup  messages. 

outstaading-aeaaage-table :  A  dictionary  which  maps  from  ids  of  messages  to  their 
associated  futures. 

runnable-process-list :  A  priority  queue  which  implements  the  scheduling  policy  de¬ 
fined  for  the  agent. 

scheduler- lock :  A  monitor  data  type  which  is  used  to  implement  mutual  exclusion 
around  routines  which  modify  the  agent  scheduler  database. 

process-table:  A  dictionary  which  maps  from  CARE  process  handles  to  caos 
runable-iteas. 

self-address:  The  stream  upon  which  the  agent’s  input  process  listens  for  requests 
and  responses  from  other  agents. 

priority-queue-context:  Holds  information  for  creating  nodes  in  the  runnable- 
process-list  priority-queue. 

care-site:  Points  to  the  care-site  structure  for  the  site  upon  which  the  agent  is 
located. 

synbolic-naae:  Holds  the  name  of  the  agent.  Statically-created  agents  are  named  by 
the  application  program;  dynamically-created  agents  are  named  by  caos,  using 

gensyn. 

agent-scheduler:  Holds  the  CARE  process  handle  of  the  process  which  is  currently 
performing  the  duties  of  the  agent  scheduler. 

running-processes :  Holds  a  list  of  runnable-iten’s  which  represent  processes  handed 
off  to  CARE  for  execution. 

syabolically-ref  erenced-agents :  Holds  a  list  of  other  agents  to  be  referenced  by 
name  by  methods  executing  within  the  context  of  the  agent. 
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initial-fora*:  A  list  of  expressions  to  be  evaluated  after  caos  has  been  initialized. 

The  purpose  of  these  forms  is  to  initialize  an  application. 

:s*l*ct-proc*ss-fifo  tiem-1  item-2  [Method  of  vanilla-agent] 

This  method  implements  FIFO  scheduling  of  tasks  within  an  agent.  It  is  called  as  the 
priority  function  for  the  priority-qusu*  stored  in  the  rurmable-process-list.  Pri¬ 
orities  are  derived  by  comparing  the  tiae-staap  fields  of  i tem-1  and  item-2,  which  are 
runnable-itea's. 

procass-aganda-agant  [Agent] 

The  procass-aganda-agant  is  a  subclass  of  vanilla-agent.  It  differs  from  vanilla- 
agent  in  that  certain  message  names  may  be  given  execution  priorities.  Such  priorities 
are  defined  by  specifying  message  names  in  order  in  a  list  stored  in  the  procaas-aganda 
instance  variable;  messages  at  the  front  of  the  list  have  higher  priority  than  those  at  the 
end  of  the  list. 

:s*l*ct-proc*aa-ag*nda-tia*staap  item-1  item-2  [Method  of  procass-aganda-agant] 

This  method  implements  “agenda-based”  schedulihg  of  tasks  in  an  agent.  It  is  the  prior¬ 
ity  function  for  the  runnable-process-list.  Priorities  are  derived  by  first  comparing 
the  asssage-naas  fields  of  item-1  and  item-2;  if  these  fields  are  the  same,  the  function 
then  compares  the  tiaa-staap  fields,  as  in  the  FIFO  scheduler  above. 

B.7  Initializing  a  CAOS  Application 

The  file  czarinit.  lisp  contains  the  code  which  initializes  caos  at  the  start  of  a  run.  Initialization 
occurs  in  two  distinct  phases:  one,  static,  before  the  care  simulator  is  started,  and  the  other, 
dynamic,  just  after. 

The  first  set  of  functions,  macros,  and  methods  in  czarinit  .lisp  is  involved  in  static  initializa¬ 
tion.  During  this  phase,  the  application  initialization  file  (see  Figure  4.4  and  Appendix  A)  is  read 
and  interpreted.  As  a  result  of  interpreting  this  file,  ail  statically-declared  agents  are  created  on  the 
appropriate  sites,  and  the  messages  which  initialize  the  application  once  caos  is  running  are  stored 
away. 

:init  [:aft*r  Method  of  car#-s its] 

During  the  static  phase,  new  instances  of  care-sits  Flavor  instances  are  created.  The 
:  init  method  is  primarily  responsible  for  initializing  ail  of  the  abstract  data  types  which 
are  part  of  the  care-site. 

:init  [:  after  Method  of  vanilla-agent] 

When  a  new  agent  instance  is  created,  the  :init  method  initializes  a  number  of  ab¬ 
stract  data  types,  and  also  adds  an  entry  to  the  appropriate  care-site’s  local-agents 
dictionary. 


aake-iaitial-agsnt  agtnt-daas  global-name  care-atte 


[.Vfacro] 


This  macro  is  invoked  when  the  eaos-initialize  form  is  interpreted.  Agent-class  is 
the  name  of  an  agent  class  as  defined  by  dsfagsnt.  Global-name  is  the  name  by  which 
this  instance  of  the  agent  class  will  be  known  throughout  the  processing  grid.  Care-site 
is  a  two-element  list  specifying  the  x  and  y  coordinates  of  the  care-sits  upon  which 
the  new  agent  will  be  created.  When  the  macro  is  executed,  an  instnace  of  agent-class 
with  name  global-name  is  created  on  care-site. 

initial-agent -record  [Structure] 

This  structure  defines  the  a  three-tuple  with  fields  name,  class,  and  location.  Instances 
of  this  tuple  make  up  the  agent-instances  argument  to  the  caos-initialize  macro 
(below).  The  initial-agent-record  also  defines  the  argument  list  to  aake-initial- 
agent. 

caos-initialize  agent-instances  initial-messages  [Macro] 

Calls  to  this  macro  are  the  means  by  which  caos  applications  are  initialized.  Agent- 
butances  is  a  list  of  initial-ageiLt-record  structures.  Initial-messages  is  a  list  of 
expressions  to  be  evaluated  when  CAOS  has  finished  initializing. 

When  a  caos-initialize  form  is  evaluated,  four  major  activities  occur. 

1.  All  statically-declared  agents  are  created  by  mapping  over  agent-instances  and  call¬ 
ing  aake-initial-agent  on  each  element. 

2.  An  agent  of  class  initial-agent  is  defined.  The  initial-agent  class  is  a  subclass 
of  vanilla-agent  which  makes  reference  to  all  other  statically-declared  agents. 

3.  An  instance  of  the  initial-agent  class,  called  007  is  created  on  site  (1,  1). 

4.  The  initial- messages  argument  is  used  to  define  an  .’initial-form  method  for  the 
class  initial-agent. 

The  remainder  of  czarinit.lisp  is  devoted  to  dynamic  initialization.  The  necessary  site  and 
agent  instances  were  created  during  the  static  phase;  during  the  dynamic  phase,  these  structures 
must  be  linked  up  with  care.  Dynamic  initialization  consists  of  starting  the  site  manager  processes 
in  each  of  the  sites,  starting  the  input  monitor  and  scheduler  processes  in  each  of  the  agents,  and 
exchanging  the  names  and  addresses  of  each  of  the  agents  in  order  to  resolve  symbolic  references. 
Dynamic  initialization  is  completing  by  sending  agent  007  an  :  initial-form  message, 
start -czar  inifislirer-jtream  [Process] 

The  start-czar  process  is  the  first  process  run  once  CARE  starts.  It  drives  all  dynamic 
initialization  tasks,  as  follows: 

1.  Creates  a  site  manager  process  in  each  site. 

2.  Waits  for  each  site  manager  process  to  return  the  address  upon  which  it  listens  for 
requests. 


3.  Creates  a  process  on  each  site  that  contains  a  statically-declared  agent,  whose  task 
is  to  initialize  those  agents. 

4.  Waits  for  each  site  containing  statically-declared  agents  to  indicate  its  agents  are 
initialized. 

5.  Sends  the  : initial-fora  message  to  the  agent  named  007. 

start-site  initializer- stream  site-stream  [Process] 

This  process  is  the  caos  site  manager.  Upon  start-up,  it  sends  the  value  of  site-stream 
to  initializer-stream  (upon  which  the  start-czar  process  is  waiting).  It  then  enters  an 
endless  loop  in  which  it  responds  to  service  requests  directed  to  site-stream.  The  specific 
services  implemented  by  the  site  manager  were  discussed  in  Section  5.2. 

start-agents  all- care- sites- list  start- agents- stream  [Process] 

This  process  is  responsible  for  initializing  statically-declared  agents  on  each  site.  For 
each  agent,  it  does  the  following: 

1.  Starts  the  input  monitor  process. 

2.  Broadcasts  a  : nee- initial-agent-online  message,  containing  the  agent’s  name 
and  the  address  upon  which  its  input  monitor  process  listens,  to  all  other  site 
managers  in  the  grid  (the  value  of  all-care-sites-list). 

3.  For  each  agent  named  in  the  agent’s  syabolically-ref  srsncsd-agsnts  instance 
variable,  sends  a  :request-syabolic-reference  message  to  the  site  manager,  and 
waits  for  a  response. 

4.  Sends  a  message  to  the  start-czar  process  indicating  that  the  site  is  ready  to  run. 


B.8  The  CAOS  Runtime  System 

The  file  czar.lisp  contains  the  “runtime”  system  for  caos.  The  functions  documented  in  sections 
4.3  and  4.4  are  implemented  by  in  this  file.  In  what  follows,  we  document  those  functions  upon 
which  the  functions  in  these  sections  depend. 

agendize  future  [Defun- Method  0/ vanilla-agent] 

This  is  the  low-level  function  used  to  suspend  a  process  until  future  receives  a  value. 

It  sets  the  calling  process’s  state  to  :  suspended,  adds  the  process’s  runnable-itea  to 
the  list  of  processes  waiting  for  /stare,  sets  the  contest  field  of  the  ronnable-iten  to 
be  the  process’s  wakeup  stream,  and  sends  to  itself  the  reschedule  message,  which 
invokes  the  scheduler  to  put  the  process  to  sleep.  Upon  waking  up,  it  sets  the  process’s 
state  to  : running,  and  returns  to  its  caller  (typically,  value-future). 

[Defun-Method  0/ vanilla-agent] 


aulti-agendize  multi-future 


This  function  is  the  multi-future  version  of  agendize. 

•renote-address-enuaerat  ing-f unctions*  [  Variable] 

This  variable  holds  an  association  list  which  maps  zetalisp  data  types  into  a  function, 
which  when  applied  to  an  object  of  the  associated  type,  returns  a  list  of  remote  addresses. 

This  allows  application  programs  built  on  top  of  caos  to  represent  collections  of  agents 
in  forms  other  than  lists. 

coerca-dest illation  dest-stream  [De fun- Method  of  vanilla-agent] 

This  function  coerces  dest-stream,  which  may  be  a  remote  address,  a  future,  or  the  name 
of  an  instance-variable  in  self  into  a  stream. 

If  dest-stream  is  a  reaot e-address,  it  is  returned  unmodified.  If  dest-stream  is  a  symbol, 
it  is  evaluated  in  the  context  of  self,  and  is  expected  to  evaluate  to  a  reaote-addxeas 
(this  is  the  mechanism  by  which  application  programs  are  able  to  refer  to  statically- 
declared  agents  by  name).  Finally,  if  dest-stream  is  a  future,  coerce-destination 
calls  value-future  to  retrieve  the  destination  renote-address. 

list-of -renote- addresses  list  [Defun-Method  of  vanilla-agent] 

This  is  the  enumerating  function  for  lists  of  remote  addresses. 

euuaerate-destinations  remote-addresses  [Defun-Method  0/ vanilla-agent] 

This  function  uses  *renote-address-enunerating-functions*  to  coerce  remote- 
addresses  into  a  list  of  renote-address’s. 

strean-send  dest-stream  priority  flags  message  args  [Defun-Method  of  vanilla-agent]’ 

This  function  is  a  common  subfunction  used  by  CAOS-defined  posting  operators.  It  uses 
the  facilities  of  CARE  to  send  message  and  args  to  dest-stream  with  care  priority  priority. 

Flags  is  a  list  which  controls  the  operation  of  strean-send.  The  following  symbols  may 
be  included  in  flags: 

:no-retum  — Causes  strean-send  to  send  a  side-effect  message. 

: return-future  — Causes  strean-send  to  create  a  future,  assign  it  a  unique  identifier, 
send  the  message  with  self-address  as  the  return  address,  and  return  the  new 
future  to  the  caller. 

:retnrn-nulti-future  — Like  :  return-future,  but  causes  strean-send  to  create  and 
return  a  nulti-future  instead  of  a  future. 

•  ajagl q-ass ignnent  — Causes  strean-send  to  create  a  single-assignment  future,  a 
future  whose  value  can  only  be  set  once. 

•ake-and-initialize-future  type  [Defen- Method  of  vanilla-agent] 


This  function  creates  a  new  future  of  type  type  (either  future  or  multi-future).  It  also 
generates  a  unique  identifier  for  the  future  in  the  agent’s  outstanding-message-table, 
and  places  the  future  in  the  table,  keyed  by  the  unique  identifier. 

format-streaa-request  id  stream  message  args  [Function] 

This  function  formats  a  message  and  its  arguments  for  transmission  to  another  agent. 

Id  is  the  unique  id  of  the  message;  stream  is  the  stream  to  which  answers  should  be 
directed. 

agent-input-process  agent  request-stream  [Process] 

This  process  is  the  process  which  monitors  sslf-addrsss  for  requests  and-  responses 
from  other  caos  agents.  It  is  created  exactly  once  per  agent,  and  performs  the  following 
initialisation  steps: 

1.  c  ts  sslf-addrsss  to  the  value  of  request-stream. 

2.  Creates  the  agent  scheduler  process. 

3.  Arms  all  clocks  in  the  agent. 

After  initialising  the  agent,  agsnt-inpnt-procsss  enters  a  loop,  in  which  it  waits  for 
messages  directed  to  sslf-addrsss,  and  then  processes  them  accordingly. 

:  handle-request  request  for- effect  [Method  of  vanilla-agent] 

This  method  is  invoked  by  the  input  monitor  process  when  a  request  message  is  received. 

It  allocates  a  new  runnable-itea,  and  fills  in  its  fields  by  copying  from  request ,  a 
request-aessage  structure. 

It  then  sends  the  new  runnable-itea  to  the  scheduler  process.  If  the  scheduler  is 
idle  when  this  method  is  invoked,  the  runnable-itea  is  sent  to  the  process  in  a  care 
message  (this  reawakens  the  idle  scheduler);  otherwise,  the  runnable-itea  is  simply 
enqueued  on  the  agent’s  runnable-process-list. 

:  handle-response  response  [Method  of  vanilla-agent] 

This  method  is  invoked  when  the  input  monitor  process  encounters  a  reponse-aessage. 

It  first  checks  if  the  response  is  directed  towards  a  future  or  a  aulti-future.  In  the 
latter  case,  it  calls  upon  the  :handle-aulti-rsponse  method  to  process  the  response. 

In  the  former  case,  it  does  the  following: 

1.  If  the  future  associated  with  the  response  is  a  single-assignment  future,  the  future 
is  removed  from  the  agent’s  outstanding-asssags-tabls. 

2.  The  value  is  removed  from  the  response,  and  placed  in  the  value  field  of  the  future. 

3.  The  satisfied  field  of  the  future  is  set  to  t. 

4.  The  : run-processes  method  is  invoked,  which  restarts  all  processes  waiting  on 
the  future. 


:  handle-suit i-reponse  multi-future  value  source 


[Method  of  vanilla-agent] 


This  method  is  called  when  a  response  to  a  ault  i- future  is  received.  Source  is  a  cons 
of  the  sending  agent’s  name  and  self-address;  individual  future’s  in  the  nulti-future 
may  be  keyed  by  either. 

The  method  uses  source  to  find  the  appropriate  future  in  the  nulti-future’s 
unsatisfied-future  list,  and  places  oalue  in  its  value  field.  If  the  nulti-future 
is  in  :any  wake  up  mode,  all  processes  waiting  on  the  future  are  reawakened;  if  the 
nulti-future  is  in  :all  mode,  the  waiting  processes  are  reawakened  only  if  there  are 
no  more  unsatisfied  future’s. 

agent-scheduler  agent  scheduler- process- stream  [Process] 

This  process  is  the  caos  scheduler  process  for  agents.  It  is  written  as  a  loop  which 
performs  the  following  operations: 

1.  If  the  scheduler  has  previously  determined  that  there  are  no  runnable  processes,  or 
if  there  are  requests  waiting  in  the  runnable-procsss-strsaa,  the  scheduler  tries 
to  get  the  next  request  from  the  runnable-procsss-strsan.  If  neither  condition 
is  true,  the  scheduler  skips  to  step  3,  below. 

2.  If  the  message  is  a  symbol,  it  is  the  name  of  a  clock  which  has  just  ticked;  in  this 
case,  the  scheduler  sends  the  :tick  message  to  the  clock. 

If  the  message  is  a  runaable-itan,  it  is  a  request  to  the  scheduler  to  perform  an 
operation  on  the  associated  process.  To  be  sent  to  the  scheduler,  the  state  of  the 
process  must  be  either  :  suspended  or  :  never-run.  In  either  case,  the  scheduler 
adds  the  item  to  the  runnable-process-list. 

3.  The  scheduler  next  tries  to  hand  to  CARE  for  execution  as  many  processes  as  it  can. 

The  number  of  processes  it  is  allowed  to  run  at  any  one  time  is  determined  by  the 
value  of  enuaber-of-runniag-agent-processes*. 

4.  Finally,  the  scheduler  checks  to  see  if  soy  special  conditions  are  outstanding.  One 
special  condition  is  that  the  user  has  requested  a  breakpoint  (e.g.,  to  perform  some 
debugging  with  the  CARE  clock  shut  off).  The  other  special  condition  is  that  it 
is  about  to  be  too  late  to  perform  an  immediate  garbage  collection;  in  this  case, 
the  scheduler  shuts  off  the  care  clock,  and  calls  gc-isosediatsly,  the  zetalisp 
function  which  invokes  the  garbage  collector. 

:add-to-runnable-process-list  item  [Method  of  vanilla-agent] 

This  method  enqueues  a  ruanable-itsn  on  the  agent’s  runnable-process-list.  If  the 
caos  instrumentation  package  is  enabled,  it  also  adds  a  line  representing  the  process  to 
the  scrolling-text-panel. 


:  choose-next-runnable-it 


[Method  of  vanilla-agent] 


This  method  removes  the  highest-priority  runnable- item  from  the  runnable-process- 
liet,  unless  the  number  of  processes  already  handed  to  care  is  greater  than  or  equal 
to  *nunber-of-agent-running-processes«. 

If  the  caos  instrumentation  package  is  enabled,  and  an  item  was  removed  from  the 
queue,  this  method  also  removes  the  line  representing  the  process  from  the  scrolling- 
text-panel. 

:schsduls-asxt-procsss  retum-new-ttems  [Method  of  vanilla-agent] 

This  method  is  called  by  the  scheduler  process  to  hand  the  highest-priority  process  to 
CARE  for  execution.  If  the  state  of  the  process  is  : never-run,  the  :  create-nev-process 
method  is  invoked  to  create  a  new  process.  If  the  state  of  the  process  is  :  runnable,  the 
process  is  reawakened  by  calling  the  function  resune-old-iten. 

:  reschedule  future  [Method  of  vanilla-agent] 

This  method  is  invoked  to  suspend  a  process  until  future  has  a  value.  It  first  updates 
the  CAOS  instrumentation,  then  tries  to  run  as  many  processes  as  possible  (to  keep  the 
processor  as  busy  as  possible),  and  then  suspends,  waiting  for  a  packet  on  its  wakeup 
stream.  Upon  reawakening,  it  updates  the  CAOS  instrumentation  once  again,  and  returns 
to  its  caller  (typically  agendize). 

:  crsats-nss-procsss  runnable-item  [Method  0/ vanilla-agsnt] 

This  method  is  called  to  create  a  new  application-level  process.  It  preferentially  recycles 
a  process  waiting  in  the  free-process-queue  of  the  care-site  associated  with  the 
agent.  If  there  are  no  free  processes  available,  it  creates  a  new  process  using  the  facilities 
of  CARE. 

■essage-handler  agent  runnable-item  wakeup-stream  [Process] 

All  CAOspostings  are  executing  in  processes  in  which  nessage-handler  is  the  top-level. 

This  process  is  a  loop,  which  does  the  following: 

1.  Executes  the  message  and  arguments  contained  in  runnable-item,  an  instance  of  a 
runnable- it sa. 

2.  Tries  to  pull  the  next  runnable-iten  in  state  :nsvsr-run  off  the  runnable- 
procass-list.  If  there  is  such  an  item,  nessage-handler  returns  to  step  1  with 
runnable-ttem  set  to  the  new  runnable-iten. 

3.  Otherwise,  the  process  queues  itself  on  the  free-process-queue  of  its  associated 
care-site,  to  be  reused  later.  It  does  this  by  calling  the  function  vait-for-an- 
iten. 

czar-initialixe  dimensions  file  auz-dtsplay  [Function] 

This  function  is  called  to  start  caos.  It  initialize  a  number  of  global  variables,  sets 
up  the  caos  instrumentation,  and  reads  the  file,  the  application  file  which  contains  the 
caos-initialize  form. 
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